Which company has best AI model end of April? (Chatbot Arena Leaderboard)
185
1kṀ540k
May 1
62%
Google
26%
OpenAI
4%
xAI
2%
DeepSeek
2%
Meta
2%
Anthropic
1%
Alibaba
1%
Other

Resolution is based on the chatbot arena LLM leaderboard (https://lmarena.ai), specifically the company with the highest Arena Score in the Overall category without filters (without style control or show deprecated), at the end of April 30, 2025 11:59PM ET.

See also:
/Bayesian/which-company-has-best-ai-model-end (resolved)

/Bayesian/which-company-has-the-best-ai-model
/Bayesian/who-will-have-the-best-texttoimage-SO0uN6suuS

/Bayesian/who-will-have-the-best-texttovideo-AtZ0CdIc8Z
/Bayesian/which-company-has-best-ai-computer

/Bayesian/which-company-has-best-vision-ai-en

Get
Ṁ1,000
to start trading!
Sort by:
bought Ṁ50 YES

Gemini cheated in the benchmarks guys

No

@Bayesian yeah ur correct i mixed it up with llama my bad 😭

I have been using Chat GPT to help me make diagrams in latex for lectures and papers. It’s is very good and can get it mostly right on the first try when I prompt it with a scenario I want to visualise. I tried Gemini today. It was horrible. Did not understand the prompt and the code it gave didn’t even compile . I realise this is a small thing, but just thought the info might be helpful.

I think this market is really really under-estimating OpenAI and I'm tremendously confused. Is there a reason to think that o3 will be outperformed by Gemini?

@Clue First of all, there's a chance that o3 won't be released this month, or at least that it won't be available on the arena.

Second, o1 has scored relatively poorly on the leaderboard. The latest version of GPT-4o has an Elo score of 1410, and o1 has an Elo score of only 1351.

I'm not quite sure why that is. My first guess was that OpenAI's reasoning models are tuned specifically toward math and coding problems and do less well in other areas. But their Elo score isn't that different for math and coding categories compared to the overall score.

@TimothyJohnson5c16 I'm my experience chatgpt is the one that makes the most hallucinations when coding, those are super annoying

@TimothyJohnson5c16 I hadn't considered that o3, were it to release, might underperform because of what it's designed to be good at. That changes how I feel. Having said that, I'm incorporating that it might not come out in my statement: https://manifold.markets/Clue/openai-o3-release-to-plus-users-in

bought Ṁ50 YES

@VilnisSanijs please stop commenting spam on every market you trade on

new month, june! doing them in advance seems useful but i'm not sure i could be convinced otherwise

meta is looking strong with "spider"

@jim meh it's not even close to 2.5

@Bayesian no.2 on leaderboard 🤷

@jim Sloptimized

@Bayesian unparsimonious

@jim It is parsimonious, it explains how people hate using it and it doesn’t feel smart

boughtṀ10 NO

@EyasAyesh Fill free to fill me at better prices

opened a Ṁ1,000 NO at 30% order

I think OA's time in the sun may have come to an end. NO order at 30%.

@jim wanna put it up again? i'll fill

opened a Ṁ250 NO at 50% order
opened aṀ1,000 YES at 10% order

@jim Now that’s just predatory

opened a Ṁ10,000 NO at 12% order

what happened to being 12% sure

opened a Ṁ1,000 YES at 10% order

@Bayesian Keely criterium

bought Ṁ25 YES

@jim Being willing to bet only like 1000 manas at 12% implies a kelly credence of like 12.01%… it lacks conviction

@Bayesian order up

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules