News > 9th March 2024

9th March 2024

BitNet Transformer: Scaling 1-bit Transformers for Large Language Models

BitNet Transformer, a architecture that scales 1-bit Transformers for large language models. BitNet Transformer achieves competitive performance while substantially reducing memory footprint and energy consumption compared to state-of-the-art 8-bit quantization methods and FP16 Transformer baselines.

Key Features:

BitLinear: A drop-in replacement for the nn.Linear layer in PyTorch, enabling the training of 1-bit weights from scratch.
Scalable and Stable: BitNet Transformer is designed to be scalable and stable, capable of handling large language models efficiently.
Competitive Performance: Achieves competitive results in terms of perplexity and downstream task accuracy compared to baselines.
Significant Energy Savings: Provides substantial energy cost reductions, especially as the model size scales up.
Scaling Law: Exhibits a scaling law akin to full-precision Transformers, suggesting its potential for effective scaling to even larger language models.

Availability:

GitHub: The code and implementation details are available on GitHub.
Blog Post: For a detailed overview and analysis of BitNet Transformer, please refer to our blog post.