Will >50% of the tasks in the WebArena benchmark be solved by EOY 2024? | Manifold

Will >50% of the tasks in the WebArena benchmark be solved by EOY 2024?

Plus

14

Ṁ350

Jan 1

62%

chance

1D

1W

1M

ALL

In this tweet (https://twitter.com/ajeya_cotra/status/1684358475416064001?s=20), Ajeya Cotra (admirably) predicted that there's >50% chance >50% of the tasks in the newly announced WebArena benchmark will be solved by a single agent. Note that Ajeya didn't specify that a single agent had to solve all of them but I will resolve based on that, so there is the possibility of divergence.

This question is managed and resolved by Manifold.

#Technical AI Timelines

Get

1,000

and

3.00

Sort by:

Any reason this (blog post) shouldn't qualify to resolve to "Yes"?

The official WebArena leaderboard also now shows Jace with a >50% result.

For a baseline of current status: the paper author's tweet thread

Completing such realistic tasks is challenging. Our best GPT-4 agent achieves a limited end-to-end task success rate of 10.59%

Understanding HTML with Large Language Models provides some evidence that bidirectional encoder-decoder models outperform GPTs on understanding raw web page HTML, but this benchmark includes more than that:

raw web page html
pixel-based screenshot
accessibility tree of the webpage. Seems like this is a subset of the html DOM tree

Related questions

Will an AI achieve >85% performance on the FrontierMath benchmark before 2028?

Will any model get above human level (92%) on the Simple Bench benchmark before September 1st, 2025.

+19% 1d52% chance

Will an AI achieve >85% performance on the FrontierMath benchmark before 2027?

AI resolves at least X% on SWE-bench assistance, by 2025?

Will openAI have the most accurate LLM across most benchmarks by EOY 2024?

Will an AI achieve >30% performance on the FrontierMath benchmark before 2026?

-27% 1d28% chance

Will an AI score over 30% on FrontierMath Benchmark in 2025

What will be the best score on the WebArena benchmark before 2025?

40% on cybench by EOY 2024

AI resolves at least X% on SWE-bench WITH assistance, by 2028?

Related questions

Will an AI achieve >85% performance on the FrontierMath benchmark before 2028?

Will an AI achieve >30% performance on the FrontierMath benchmark before 2026?

Will any model get above human level (92%) on the Simple Bench benchmark before September 1st, 2025.

Will an AI score over 30% on FrontierMath Benchmark in 2025

Will an AI achieve >85% performance on the FrontierMath benchmark before 2027?

What will be the best score on the WebArena benchmark before 2025?

AI resolves at least X% on SWE-bench assistance, by 2025?

40% on cybench by EOY 2024

Will openAI have the most accurate LLM across most benchmarks by EOY 2024?

AI resolves at least X% on SWE-bench WITH assistance, by 2028?

© Manifold Markets, Inc.•Terms + Mana-only Terms•Privacy•Rules