Will o3's score on the Last Exam be above 30%?
Plus
8
Ṁ7442026
30%
chance
1D
1W
1M
ALL
This question is managed and resolved by Manifold.
Get
1,000
and3.00
Sort by:
The Last Exam appears to be primarily a knowledge benchmark, rather than a problem-solving benchmark. All frontier models score very highly on other knowledge benchmarks, but score poorly on The Last Exam. o3 is unlikely to be significantly more knowledgeable than other frontier models.
@Haiku I don’t fully agree. The benchmark was created by mostly filtering through questions that none of frontier models (at that time) can answer.
In math, a lot of these questions are problem solving. I assume o3 is very good at problem solving.