In Feb 2022, Paul Christiano wrote: Eliezer and I publicly stated some predictions about AI performance on the IMO by 2025.... My final prediction (after significantly revising my guesses after looking up IMO questions and medal thresholds) was:
I'd put 4% on "For the 2022, 2023, 2024, or 2025 IMO an AI built before the IMO is able to solve the single hardest problem" where "hardest problem" = "usually problem #6, but use problem #3 instead if either: (i) problem 6 is geo or (ii) problem 3 is combinatorics and problem 6 is algebra." (Would prefer just pick the hardest problem after seeing the test but seems better to commit to a procedure.)
Maybe I'll go 8% on "gets gold" instead of "solves hardest problem."
Eliezer spent less time revising his prediction, but said (earlier in the discussion):
My probability is at least 16% [on the IMO grand challenge falling], though I'd have to think more and Look into Things, and maybe ask for such sad little metrics as are available before I was confident saying how much more. Paul?
EDIT: I see they want to demand that the AI be open-sourced publicly before the first day of the IMO, which unfortunately sounds like the sort of foolish little real-world obstacle which can prevent a proposition like this from being judged true even where the technical capability exists. I'll stand by a >16% probability of the technical capability existing by end of 2025
So I think we have Paul at <8%, Eliezer at >16% for AI made before the IMO is able to get a gold (under time controls etc. of grand challenge) in one of 2022-2025.
Resolves to YES if either Eliezer or Paul acknowledge that an AI has succeeded at this task.
Related market: https://manifold.markets/MatthewBarnett/will-a-machine-learning-model-score-f0d93ee0119b
Update: As noted by Paul, the qualifying years for IMO completion are 2023, 2024, and 2025.
Update 2024-06-21: Description formatting
Related questions
Is this market tied to or connected with the resolution of the IMO Grand Challenge?
https://imo-grand-challenge.github.io/
Seeking clarification on the following resolution criteria to be more specific:
- Does closed weights ('closed source') models count for this question? The IMO Grand Challenge above IIUC only accepts open weights ('open source') candidates.
- If closed weights count, what are the boundary conditions? I.e. if one AI lab or a sole researcher announces that they internally achieved gold medal level performance without sharing much or anything as to how, will that be believed on face value? What are the thresholds for YES and NO in such cases? I.e. when reproduction/verification is hard or impossible.
"Formal to formal" means:
the AI receives a formal representation of the problem (in the Lean Theorem Prover), and is required to emit a formal (i.e. machine-checkable) proof.
And informal to informal just means the opposite of that (i.e. the AI receives the problem in natural language and returns a proof in natural language).
As for your other questions, I agree the link and criteria leave them a little ambiguous. The only reference to the matter of "open source" seems to be Eliezer saying that other people were asking for it but that he wouldn't stand by the prediction if that were the criterion. Still, I imagine that if someone who no one has ever heard of comes forward claiming to have succeeded and refusing to share their code, they wouldn't be taken seriously.
That leaves us with gray zones currently, right? For instance a reputable lab could still claim that they achieved it internally. Think: OAI announces that a special fine tune of GPT-5 achieved gold medal level performance, and they say they are not releasing it or more info about it yet. So is it going to come down to just trusting them vs not? Could we get a bit more clarity @Austin?
This market is intended to track the bet between Eliezer & Paul, which predates the IMO grand challenge. While I imagine an IMO grand challenge win would be acknowledged by them as settling this, a grand challenge win is not necessary. So roughly, if Eliezer or Paul trusts that the AI succeeded at getting gold (say, if OpenAI showed the model to Paul but not the world), then that would suffice to resolve YES.
@Austin I see that you've clarified the description, thank you very much! I'll be considering how much this gets rid of grey zones.
So as I understand you; in case they disagree with each other, this question will be erring on the side of YES. That might be unexpected to some who have already bet, but at least it sounds clear to me.
Mixing Monte Carlo Tree search with LLMs is starting to happen and looks very promising: https://arxiv.org/pdf/2406.07394
Would you bet against AI developing FTL next week? Instead of making blanket claims, each proposition should be evaluated on its own merits.
And while it’s progressing faster than many expected, it’s not progressing “faster than the most optimistic predictions”. In 2011, the most optimistic 25% of AI researchers (or pessimistic, depending on your perspective), gave a 10% chance of human-level machine intelligence by 2015
@JimHays Obviously not. I meant within reason, just like I wouldn't bet on anyone but God being able to create a universe tomorrow. I would however bet AI developing FTL before 2050 above 50% would you bet against that?
I should have said that AI has progressed faster than all mainstream predictions or almost all predictions though, youre right
It stands for Faster Than Light, but I think what most people mean when they say it is a way to travel between two locations faster than light speed, so it may not involve having to break that law of physics.
When ASI enters this world I trust that it will revolutionize physics after killing us and figure something out
@stardust Aside from the new removal of the loan structure decreasing the incentive for long-term bets, I’d absolutely bet down a 50% FTL market
A related numeric market: https://manifold.markets/IhorKendiukhov/when-will-an-ai-win-a-gold-medal-in