Benchmark Gap #3: Once a model achieves superhuman performance on a competitive programming benchmark, will it be less than 2 years before there are "entry level" AI programmers in industry use? | Manifold

Benchmark Gap #3: Once a model achieves superhuman performance on a competitive programming benchmark, will it be less than 2 years before there are "entry level" AI programmers in industry use?

14

1kṀ560

2051

73%

chance

1D

1W

1M

ALL

"Entry level" is deliberately fuzzy: in 2022 terms this would look like an AI (or AIs) that is assigned an issue, checks out code, makes edits, and submits a PR (that is accepted). Rough criteria: the AI acts with little oversight, performs similar (coding) work to entry-level coders at the time, the issue/task assignment is not *significantly* specialized for an AI (e.g. no full technical specs if the same wouldn't be given to a human coder) AI being used in this way in significant open source projects counts as "industry use". If there are technical demos of such AIs but none of them are actually being used question resolves as no. No requirement that it be a single model. A group of specialized models working together counts. If superhuman performance is not achieved by market end, resolves N/A.

Dec 20, 12:15am: ~~Once a model achieves superhuman performance on a competitive programming benchmark, will it be less than 2 years before there are "entry level" AI programmers in industry use?~~ → Benchmark Gap #3: Once a model achieves superhuman performance on a competitive programming benchmark, will it be less than 2 years before there are "entry level" AI programmers in industry use?

Technical AI Timelines

Ancient Markets

Get

1,000

to start trading!

Sort by:

Are there any specific existing benchmarks you had in mind?

Mana is worth less if this is true due to forthcoming end of world.

People are also trading

Will the first AI model that saturates Humanity's Last Exam be employable as a software engineer?

Once a model achieves superhuman performance on a competitive programming benchmark, will it be less than 5 years before there are "entry level" AI programmers in industry use?

Benchmark Gap #5: Once a single AI model solves >= 95% of miniF2F, MATH, and MMLU STEM, will it be less than two years before AI models are used as entry-level data science / data analysis / statistics workers?

Will an AI model achieve superhuman ELO on Codeforces by the 31 December 2025?

Benchmark Gap #1: Once we have a language model that achieves expert human performance on all *current* major NLP benchmarks, how many years will it be before we have an AI with human-level language skills?

Will there be entry-level AI coders by 2026?

Benchmark Gap #2: Once we have an algorithm with human level sample efficiency for major RL benchmarks, how many years will it be before there is an algorithm with human level sample efficiency on essentially all AAA video game tasks?

Will an AI system beat humans in the GAIA benchmark before the end of 2025?

Will any model get above human level on the Simple Bench benchmark before September 1st, 2025.

Will it take <=12 months for open-source AI forecasting to go from consistently worse than humans to broadly superhuman?

Related questions

Will the first AI model that saturates Humanity's Last Exam be employable as a software engineer?

Once a model achieves superhuman performance on a competitive programming benchmark, will it be less than 5 years before there are "entry level" AI programmers in industry use?

Benchmark Gap #5: Once a single AI model solves >= 95% of miniF2F, MATH, and MMLU STEM, will it be less than two years before AI models are used as entry-level data science / data analysis / statistics workers?

Will an AI model achieve superhuman ELO on Codeforces by the 31 December 2025?

Benchmark Gap #1: Once we have a language model that achieves expert human performance on all *current* major NLP benchmarks, how many years will it be before we have an AI with human-level language skills?

Will there be entry-level AI coders by 2026?

Benchmark Gap #2: Once we have an algorithm with human level sample efficiency for major RL benchmarks, how many years will it be before there is an algorithm with human level sample efficiency on essentially all AAA video game tasks?

Will an AI system beat humans in the GAIA benchmark before the end of 2025?

Will any model get above human level on the Simple Bench benchmark before September 1st, 2025.

Will it take <=12 months for open-source AI forecasting to go from consistently worse than humans to broadly superhuman?

© Manifold Markets, Inc.•Terms + Mana-only Terms•Privacy•Rules