Which company has best AI model end of April? (Chatbot Arena Leaderboard)

185

1kṀ540k

May 1

62%

Google

26%

OpenAI

xAI

DeepSeek

Meta

Anthropic

Alibaba

Other

Resolution is based on the chatbot arena LLM leaderboard (https://lmarena.ai), specifically the company with the highest Arena Score in the Overall category without filters (without style control or show deprecated), at the end of April 30, 2025 11:59PM ET.

See also:
/Bayesian/which-company-has-best-ai-model-end (resolved)

/Bayesian/which-company-has-the-best-ai-model
/Bayesian/who-will-have-the-best-texttoimage-SO0uN6suuS

/Bayesian/who-will-have-the-best-texttovideo-AtZ0CdIc8Z
/Bayesian/which-company-has-best-ai-computer

/Bayesian/which-company-has-best-vision-ai-en

Technical AI Timelines

OpenAI

AI Impacts

ChatGPT

Get

1,000

to start trading!

26 Comments

178 Holders

967 Trades

Sort by:

bought Ṁ50 YES

Gemini cheated in the benchmarks guys

@Bayesian yeah ur correct i mixed it up with llama my bad 😭

I have been using Chat GPT to help me make diagrams in latex for lectures and papers. It’s is very good and can get it mostly right on the first try when I prompt it with a scenario I want to visualise. I tried Gemini today. It was horrible. Did not understand the prompt and the code it gave didn’t even compile . I realise this is a small thing, but just thought the info might be helpful.

I think this market is really really under-estimating OpenAI and I'm tremendously confused. Is there a reason to think that o3 will be outperformed by Gemini?

@Clue First of all, there's a chance that o3 won't be released this month, or at least that it won't be available on the arena.

Second, o1 has scored relatively poorly on the leaderboard. The latest version of GPT-4o has an Elo score of 1410, and o1 has an Elo score of only 1351.

I'm not quite sure why that is. My first guess was that OpenAI's reasoning models are tuned specifically toward math and coding problems and do less well in other areas. But their Elo score isn't that different for math and coding categories compared to the overall score.

@TimothyJohnson5c16 I'm my experience chatgpt is the one that makes the most hallucinations when coding, those are super annoying

@TimothyJohnson5c16 I hadn't considered that o3, were it to release, might underperform because of what it's designed to be good at. That changes how I feel. Having said that, I'm incorporating that it might not come out in my statement: https://manifold.markets/Clue/openai-o3-release-to-plus-users-in