Criteria for Resolution:
1. Definition of "New Lab":
- A "new lab" refers to any company, lab, or other entity that is not OpenAI, Anthropic, DeepMind, Google, Meta, Mixtral, xAI, Microsoft, Nvidia or any subsidiary or parent company of them.
2. Top-Performing Generally Capable AI Frontier Model:
- The AI frontier model must achieve no less than a robust second place by performance. This includes:
- Unambiguous first place.
- Unambiguous second place.
- Ambiguous first place.
- Sharing first place.
- Sharing second place does not qualify.
3. Performance Metrics:
- Performance will be judged based on the most well-accepted metrics and user opinions and approvals available by then.
- For example, metrics may include benchmarks such as MMLU, HumanEval, and other relevant AI performance benchmarks.
@IhorKendiukhov Wait, so does that mean this is mostly a market on whether the frontier model will be open-weights, such that dozens of organizations fine-tune their own version of it?
@EliTyre But it is pretty unlikely that a fine-tuned external version will be unambiguous leader, is not it? This is the main reason why I think it is reasonable to include such an option. I may reconsider it if you and other people find it unreasonable.
@IhorKendiukhov If Meta produces the highest performing model, and continues their current policy of releasing all model weights, I would consider it likely that another org would release a finetune that could be said to be "sharing first", depending on how you interpret that.
I realize that I may have overreacted by selling shares though because I don't see it as particularly likely that Meta will catch up to the other labs before 2028.
@IhorKendiukhov, exactly what @MaxMorehead said.
If the leading model is open-source, then there will be dozens, and possibly hundreds of fine-tuned versions of that model, and most of them will have basically the same performance, because most of the percentage gain in capabilities comes from pretraining, not from fine-tuning.
If you count them separately, they'll all be sharing first place, but they'll all be doing it on the labor of the one major lab that did the massive spend on pretraining.
Given that, I think it make more sense to treat fintuned derivatives of a given model X, as model X.
A reason you might not want to do that is if you guess that someone will figure out some trick to get unambiguously better performance than the standard version of a model by doing something special during fine-tuning.
I don't expect that, because I think we've seen 0 examples of that so far. But admittedly, if there is some fine-tuning trick that gets notably better performance than that of a standardly fine-tuned model, that would be a way for a currently unknown lab to get the top spot. (Though, because fine-tuning is so inexpensive compared to pretraining, they would have no moat. All the major labs could presumably use the fine-tuning trick, and then catch up.)
The compute cost of training a cutting edge model is in the hundreds of millions currently. Epoch estimates that it's going to continue to go up by 0.2 OOM each year.
That's without accounting for the human capital costs. Training a cutting edge model is going to require a bunch of engineering schlep, which means hiring some world class people.
You need to have both deep pockets and a strong motivation to start an AI lab for this to make sense. So maybe a national govenrment?
It must be considered a general purpose model with general capabilities. A video generation model can in principle be in this class. If there is a capable video generation model that can be applied for various tasks and it demonstrates strong intelligence capabilities, it will qualify. If, for example, it is just the best model in the category of the most aesthetically beautiful short videos generators or the best advertisement producers, it will not qualify.