Will there be an LLM which scores above what a human can do in 2 hours on METR's eval suite before 2026? | Manifold

Will there be an LLM which scores above what a human can do in 2 hours on METR's eval suite before 2026?

6

100Ṁ369

2026

82%

chance

1D

1W

1M

ALL

METR has found that current frontier models get a score on their autonomy benchmark roughly similar to a human who is given 30 minutes. Will at least one model score at the level of a human given 2 hours by 2026?

Clarifications:

I will try to resolve this market in accordance with the current task suite. If METR makes the suite harder or easier I will try to account for this in the resolution of this market.
if I am not able to determine the performance of frontier models at the end of 2025, this market will be resolved NA

Technical AI Timelines

Machine Learning

Get

1,000

to start trading!

People are also trading

Will an LLM be able to solve the Self-Referential Aptitude Test before 2027?

Will an LLM improve its own ability along some important metric well beyond the best trained LLMs before 2026?

LLM Hallucination: Will an LLM score >90% on SimpleQA before 2026?

Will LLMs be able to formally verify non-trivial programs by the end of 2025?

Will an LLM report >50% score on ARC in 2025?

Will an LLM agent complete >50% of the lab tasks on the Factorio Learning Environment benchmark in 2025?

Will one of the major LLMs be capable of continual lifelong learning (learning from inference runs) by EOY 2025?

Will there be major breakthrough in LLM Continual Learning before 2026?

Will there be any text-based task that most humans can solve, but top LLMs won't? By the end of 2024

Will LLMs be better than typical white-collar workers on all computer tasks before 2026?

Related questions

Will an LLM be able to solve the Self-Referential Aptitude Test before 2027?

Will an LLM improve its own ability along some important metric well beyond the best trained LLMs before 2026?

LLM Hallucination: Will an LLM score >90% on SimpleQA before 2026?

Will LLMs be able to formally verify non-trivial programs by the end of 2025?

Will an LLM report >50% score on ARC in 2025?

Will an LLM agent complete >50% of the lab tasks on the Factorio Learning Environment benchmark in 2025?

Will one of the major LLMs be capable of continual lifelong learning (learning from inference runs) by EOY 2025?

Will there be major breakthrough in LLM Continual Learning before 2026?

Will there be any text-based task that most humans can solve, but top LLMs won't? By the end of 2024

Will LLMs be better than typical white-collar workers on all computer tasks before 2026?

© Manifold Markets, Inc.•Terms + Mana-only Terms•Privacy•Rules