We will resolve either when OpenAI gives enough information (e.g., a technical report) or based on public opinion by EOY 2024.
Resolve to any number of choices that make the model stronger.
For example, if the question is about how other models get smarter than their previous model, we will have
- Llama 3: data
- Claude 3: data, parameters(? judging on the fact that opus is 10x more expensive than Claude 2), RL, multimodal(? The multimodal trained may not have improved text ability), architecture(?)
- Gemini 1.5 Pro: multimodal, architecture (long context+MoE), data(?)
With GPT-4o, we trained a single new model end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network. Because GPT-4o is our first model combining all of these modalities, we are still just scratching the surface of exploring what the model can do and its limitations.