Will a new neural architecture, or entirely different machine learning method, replace the dominance of the query, key, and value attention-based neural architectures currently dominant in large language models? Or will large language models (or more generally foundational models with different modalities) continue to scale up transformers? Fundamentally this new method must not employ layers of self-attention or cross-attention, but must show scaling laws more promising than transformer-based LLMs. This method must be commonly recognized by practitioners to be superior to transformer methods, and multiple state-of-the-art open (and closed) source models must employ this new method. From the invention of transformer, it took a few years for them to be universally utilized. However, with modern attention on foundational models, adoption of a better approach should be significantly more swift.