Mamba is a next-generation architecture that seeks to improve on the shortcomings of transformers, mainly around context size and eliminating quadratic memory consumption during inference. https://arxiv.org/abs/2312.00752
YES resolution requires the Mamba LLM to match or beat GPT 3.5 on at least 5 popular benchmarks.
does this count as Mamba-based? it's open and easily above GPT-3.5 quality
Looks like Gemini 1.5 used normal transformers and not Mamba, while also seeming to get around these shortcomings (1M context size). I expect this will cause interest in Mamba to wane, which decreases the chance someone will bother training and testing a Mamba LLM to GPT 3.5 level.
@adele I remain unconvinced that transformer architecture will be the long term winner due to its compute and memory-hungry nature. These are great improvements though.