After an AI achieve >50% performance on 15-60 minute tasks, will it take less than one year for AI to achieve >50% performance on 1-4 hour tasks?
We will default to use reporting from OpenAI, METR or other large AI organizations. If compelling third-party scaffolding demonstrations reports on this first, I will accept that if I am >90% confident in their results being accurate. The results need not use SWE-bench or METR's pre-existing dataset, if e.g. a model resolves this question on Metaculus that would be obviously sufficient. Agent/assistant tasks and code tasks both count here, if either shows sub 1-year jump then this resolves Yes. I will not predict on this question.
Background: As of mid-2024, models are often far more efficient than humans at <15 minute tasks. However, for >15 minute tasks models remain highly inconsistent.