Resolution is based on the chatbot arena LLM leaderboard (https://lmarena.ai), specifically the company with the highest Arena Score in the Overall category without filters (without style control or show deprecated), at the end of April 30, 2025 11:59PM ET.
See also:
/Bayesian/which-company-has-best-ai-model-end (resolved)
/Bayesian/which-company-has-the-best-ai-model
/Bayesian/who-will-have-the-best-texttoimage-SO0uN6suuS
/Bayesian/who-will-have-the-best-texttovideo-AtZ0CdIc8Z
/Bayesian/which-company-has-best-ai-computer
I have been using Chat GPT to help me make diagrams in latex for lectures and papers. It’s is very good and can get it mostly right on the first try when I prompt it with a scenario I want to visualise. I tried Gemini today. It was horrible. Did not understand the prompt and the code it gave didn’t even compile . I realise this is a small thing, but just thought the info might be helpful.
@Clue First of all, there's a chance that o3 won't be released this month, or at least that it won't be available on the arena.
Second, o1 has scored relatively poorly on the leaderboard. The latest version of GPT-4o has an Elo score of 1410, and o1 has an Elo score of only 1351.
I'm not quite sure why that is. My first guess was that OpenAI's reasoning models are tuned specifically toward math and coding problems and do less well in other areas. But their Elo score isn't that different for math and coding categories compared to the overall score.
@TimothyJohnson5c16 I'm my experience chatgpt is the one that makes the most hallucinations when coding, those are super annoying
@TimothyJohnson5c16 I hadn't considered that o3, were it to release, might underperform because of what it's designed to be good at. That changes how I feel. Having said that, I'm incorporating that it might not come out in my statement: https://manifold.markets/Clue/openai-o3-release-to-plus-users-in
@jim Being willing to bet only like 1000 manas at 12% implies a kelly credence of like 12.01%… it lacks conviction