Will o1 (not preview) achieve a better score on LiveBench coding than Claude 3.5 Sonnet 10/22? | Manifold

Will o1 (not preview) achieve a better score on LiveBench coding than Claude 3.5 Sonnet 10/22?

Basic

1

Ṁ75

Jan 1

75%

chance

1D

1W

1M

ALL

Per LiveBench.ai Claude 3.5 Sonnet achieves 67.13 while o1-preview gets only 50.85.

Resolves when o1 is added to the LiveBench leaderboard

This question is managed and resolved by Manifold.

#Chatbot Arena Leaderboard

Get

1,000

and

3.00

Related questions

Before February 2025, will a Gemini model exceed Claude 3.5 Sonnet 10/22's Global Average score on LiveBench?

What SimpleBench percentile range will full o1 achieve?

Will Claude 3.5 Opus be able to draw me in tic-tac-toe while playing as O at least 1/3 of the time?

Will I judge GPT-5 to be smarter than o1 (not preview) after both are released?

What will Claude 3.5 Opus's reported 0-shot performance on GPQA Diamond be upon release?

Will Claude 3.5 Opus beat OpenAI's best released model on the arena.lmsys.org leaderboard?

How well will OpenAI's o1 (not o1-preview) do on the ARC prize when it's released if tested?

Before February 2025, will a Gemini model exceed Claude 3.5 Sonnet 10/22's Global Average score on Simple Bench?

Will GPT-5 perform better than o1 (not preview) at AIME 2024, Codeforces elo, GPQA, or the 2024 ioi?

Is Claude 3.5 Sonnet a distilled or quantized version of a larger model?

Related questions

Before February 2025, will a Gemini model exceed Claude 3.5 Sonnet 10/22's Global Average score on LiveBench?

Will Claude 3.5 Opus beat OpenAI's best released model on the arena.lmsys.org leaderboard?

What SimpleBench percentile range will full o1 achieve?

How well will OpenAI's o1 (not o1-preview) do on the ARC prize when it's released if tested?

Will Claude 3.5 Opus be able to draw me in tic-tac-toe while playing as O at least 1/3 of the time?

Before February 2025, will a Gemini model exceed Claude 3.5 Sonnet 10/22's Global Average score on Simple Bench?

Will I judge GPT-5 to be smarter than o1 (not preview) after both are released?

Will GPT-5 perform better than o1 (not preview) at AIME 2024, Codeforces elo, GPQA, or the 2024 ioi?

What will Claude 3.5 Opus's reported 0-shot performance on GPQA Diamond be upon release?

Is Claude 3.5 Sonnet a distilled or quantized version of a larger model?

© Manifold Markets, Inc.•Terms + Mana-only Terms•Privacy•Rules