network_architecture by bayjarvis
README source code
- The Annotated S4 - Efficiently Modeling Long Sequences with Structured State Spaces
- Mamba Finetune with Context - Mamba: Linear-Time Sequence Modeling with Selective State Spaces
-
Implementing BitNet Transformer
- BitNet: Scaling 1-bit Transformers for Large Language Models Link to paper