Will a Chinese AI developer announce a model rivaling o3 performance by February 2025?
Basic
12
Ṁ430
Feb 2
21%
chance

Market resolves yes if a major Chinese AI developer (e.g., Tencent, DeepSeek, Baidu, 01, Alibaba, ByteDance, others that seem unlikely to totally fraud) announces evaluation results for a model which tie or surpass OpenAI's o3 December 20th results on any one of the following:

SWE-Bench Verified: 71.7%

Codeforces: 2727 Elo

AIME 2024: 96.7%

GPQA Diamond: 87.7%

Frontier Math: 25.2%

ARC-AGI Semi-Private: 87.5%

Aggressive test time scaling is allowed. Pass@1, as this appears to be what OpenAI did (but I'm not totally sure this makes the most sense, or what to do if this is ambiguous). Benchmark contamination is a concern, but this market will resolve based on stated performance, whether or not benchmark contamination is suspected.

Get
Ṁ1,000
and
S3.00
Sort by:
bought Ṁ20 NO

QwQ 32B-preview results.

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules