Will an autonomous agent resolve 90% of tasks on SWE-bench by 2025? | Manifold

Will an autonomous agent resolve 90% of tasks on SWE-bench by 2025?

Basic

14

Ṁ1446

Dec 31

11%

chance

1D

1W

1M

ALL

Resolves "Yes" if, at time of closure, there is an entry on the SWE-bench leaderboard (https://www.swebench.com/) with score greater or equal to 90%.

Linked Questions:

This question is managed and resolved by Manifold.

#Technical AI Timelines

Get

1,000

and

3.00

Sort by:

What if there's evidence that the training data is contaminated with the SWE-Bench tasks somehow?

@DavidFWatson That's an excellent question. Let's explore possibilities:

This could be included in the question, i.e. what matters is only the number on the benchmark, regardless of whether it was gamed
I could wait a certain amount of time to check if no controversy emerges. Feels like one month would be safe. The question then resolves yes if one month after the deadline, I judge that there is no consensus that the number was gamed. This makes the question more informative.

Related questions

Will an AI achieve >85% performance on the FrontierMath benchmark before 2028?

Will an autonomous agent resolve 90% of tasks on SWE-bench by 2027?

Will an AI SWE model score higher than 50% on SWE-bench in 2024?

Will an AI achieve >85% performance on the FrontierMath benchmark before 2027?

Will an autonomous personal AI agent, capable of managing daily affairs, be available by the end of 2024?

Will an autonomous agent resolve 90% of tasks on SWE-bench by 2026?

AI resolves at least X% on SWE-bench WITH assistance, by 2028?

AI resolves at least X% on SWE-bench assistance, by 2025?

Will OpenAI models achieve ≥90% on SimpleBench by the end of 2025?

Will AutoGPT-style AI Agents mostly work before the end of 2024?

Related questions

Will an AI achieve >85% performance on the FrontierMath benchmark before 2028?

Will an autonomous agent resolve 90% of tasks on SWE-bench by 2026?

Will an autonomous agent resolve 90% of tasks on SWE-bench by 2027?

AI resolves at least X% on SWE-bench WITH assistance, by 2028?

Will an AI SWE model score higher than 50% on SWE-bench in 2024?

AI resolves at least X% on SWE-bench assistance, by 2025?

Will an AI achieve >85% performance on the FrontierMath benchmark before 2027?

Will OpenAI models achieve ≥90% on SimpleBench by the end of 2025?

Will an autonomous personal AI agent, capable of managing daily affairs, be available by the end of 2024?

Will AutoGPT-style AI Agents mostly work before the end of 2024?

© Manifold Markets, Inc.•Terms + Mana-only Terms•Privacy•Rules