https://arxiv.org/abs/2312.00752
The question will resolve yes if atleast 50% of the major AI labs (OpenAI, Google Deepmind, Anthropic, Meta, and Eluether) use Mamba in their flagship SOTA model.
How do you define "Mamba"? Is it "H3 + MLP blocks"? What if it's some other SSM block + MLP? What if it doesn't have the special hardware-aware algorithm?
To me, a more interesting question is whether some sort of SSM will become dominant, rather than the specific mamba algorithm in particular, but maybe that's a different question.
This is a a fair point.
I think any SSM block + MLP + the hardware-aware algorithm are the minimum conditions to call a model "Mamba-like." But if there are any disagreements to this view, I would love to hear them before the market resolves.
@AndrewImpellitteri My tentative guess is that requiring the hardware-aware algorithm as part of the resolution criteria is going to lead to a bunch of annoying judgment calls about whether some future algorithm is close enough to what's in the mamba paper to count as "the same".
Maybe a key hallmark is something like "key operations are designed to fit in SRAM"?
I would propose the following criteria for considering a block to be mamba-like:
1. There are 1+ SSM and 1+ MLP blocks
2. The SSM is somehow selective (i.e. input-dependent)
3. Some key operation (e.g. recurrence) is designed to fit in GPU SRAM