Will reinforcement learning overtake LMs on math before 2028?

1kṀ4167

2028

70%

chance

ALL

Will a state of the art model on Hendrycks' MATH be trained for more FLOP on RL than it is on LM objectives? A purely RL model counts as well of course.

RL encompasses anything involving online learning or expert iteration-like etc. If this ends up being difficult to call because of some breakthrough in decision transformer style conditional imitation learning (ie something between rl and LMs), I will probably cancel the market as ambiguous.

When models approach 100% acc on MATH, a similar successor natural language math dataset will be used instead.

Technical AI Timelines

Get

1,000

to start trading!

People are also trading

How much will the user retention of a LM be increased via simple reinforcement learning by August of 2024?

Will any AI model achieve > 40% on Frontier Math before 2026?

83% chance

Will any language model trained without large number arithmetic be able to generalize to large number arithmetic by 2026?

51% chance

Will aesop be able to replace >50% of mathlib proofs by 2025-11-26?

41% chance

What tactic will prove the most mathlib lemmas at the end of 2026?

Will any AI model score >80% on Epoch's Frontier Math Benchmark in 2025?

17% chance

Will end-to-end neural networks such as LLMs can beat the best human player in chess by 2028?

64% chance

Next year will I think that AI is better than me at math?

70% chance

Will an AI achieve >85% performance on the FrontierMath benchmark before 2028?

76% chance

Will RL work for LLMs "spill over" to the rest of RL by 2026?

Sort by:

https://deepmind.google/discover/blog/ai-solves-imo-problems-at-silver-medal-level/

I'd guess this took something like 1-10 trillion tokens worth of FLOPs.

predictedNO

Would you count LM regularization Terms computed during RL phase as part of the LM share? This may actually be hard to disentangle?

predictedYES

@Thomas42 That’s a bit tricky, but I’d say kl penalties from base LM should just be counted as part of the RL compute. That’s not an LM loss anyway.

If this question ends up hinging on some edge case like a method which does continued LM training during RL, and the relative compute contributions are unclear Ill probably resolve N/A.

I think the first question one should ask is will anything overtake LMs. The probability that one specific technology would be the one doing the overtaking should then be below that base probability. I place the first probability at around 50%, so I am comfortable betting against this at the current price.

Why are you defining RL as online learning? Online learning encompasses more than RL. Why not define it using action/state/reward?

predictedYES

@vluzko I wanted to exclude decision transformer type stuff. Maybe it would be more fair to have titled the question 'Will online learning overtake offline learning for LMs on Math...', but I went for something more eye-catching.

I'm interested in this because I'm interested in the data shortage in terms of imitation learning data available. I also think offline learning has different safety properties.

predictedYES

Would be curious to hear why everyone's NO on this. 2028 is 5 years out, and Epoch AI estimates 4x/yr compute scaling, with text data running out by EOY 2024. That gives 3 years worth of compute scaling that needs to go somewhere else.