Sudden jump in AI long-horizon capabilities (1-4 hr version)
➕
Plus
6
Ṁ483
2028
65%
chance

After an AI achieve >50% performance on 15-60 minute tasks, will it take less than one year for AI to achieve >50% performance on 1-4 hour tasks?

We will default to use reporting from OpenAI, METR or other large AI organizations. If compelling third-party scaffolding demonstrations reports on this first, I will accept that if I am >90% confident in their results being accurate. The results need not use SWE-bench or METR's pre-existing dataset, if e.g. a model resolves this question on Metaculus that would be obviously sufficient. Agent/assistant tasks and code tasks both count here, if either shows sub 1-year jump then this resolves Yes. I will not predict on this question.

Background: As of mid-2024, models are often far more efficient than humans at <15 minute tasks. However, for >15 minute tasks models remain highly inconsistent.

https://metr.org/blog/2024-08-06-update-on-evaluations/

https://openai.com/index/introducing-swe-bench-verified/

Get
Ṁ1,000
and
S3.00
Sort by:

I'm open to suggestions on this question's resolution criteria for a month, and then I'll try to keep revision minimal afterwards.

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules