The Transformer model, introduced by Vaswani et al. in 2017, has been a cornerstone in the field of deep learning, particularly for tasks involving natural language processing, computer vision, and more. However, the computational inefficiencies of Transformers, especially in handling long sequences, have been a growing concern.
Recently, a new architecture named Mamba introduced by Gu and Dao, which leverages selective state space models (SSMs), has been introduced. This model addresses some of the key inefficiencies of Transformers by allowing selective propagation or forgetting of information along a sequence based on the current input token. Mamba demonstrates promising results, achieving state-of-the-art performance in various domains including language, audio, and genomics, sometimes even outperforming Transformers of similar or larger sizes.
This question will resolve positively if, by Jan 1, 2026, credible evidence exists that a state space model is trained using 1.00E26 FLOP or more.
This criterion is chosen as it will likely take many millions of dollars to train a 1e26 FLOP model, even in 2025, and is therefore a useful proxy for 'frontier AI labs making large bets on SSMs'.