Who will have the best LLM at the end of 2024 (as decided by ChatBot Arena)?
💎
Premium
735
Ṁ580k
Dec 31
63%
OpenAI
24%
Google
6%
xAI
6%
Anthropic

I was browsing Twitter, and I saw a post by Karpathy positively talking about ChatBot Arena, which is a platform for ranking LLMs based on human ratings. As expected, OpenAI is holding positions 1, 2, and 3. I wonder which company will be #1 at the end of 2024.


Screenshot of the rankings table taken on the 13th of December:


Get
Ṁ1,000
and
S3.00
Sort by:

@traders Based on the comments below, I think it makes sense to resolve this question based on the ELO rating in case of a tie in "rank." When I created this question, a tie was not an option, so I doubt anyone even traded based on this assumption.

I created a similar question that only uses the rank. Feel free to trade on it.


reposted

interesting shift

bought Ṁ500 OpenAI YES

@NoahRich i don’t think it is that interesting, if anything it shows google is out of the race for spot #1 this year. openai will pass them with the next minor update to 4o. they won’t even need to release a new model to pass google.

@Soli both could release another minor update in the time. there have even been reports shared here previously suggesting the potential lol

besides its interesting that google in its own has even reached this point. up until now they have been pretty far off. especially given their compute potential. interesting if they are finally starting to make use of their leg up in funding potential and compute

@NoahRich google reached this point already in july/august when they were ranked #1 for 1-2 weeks (see this other market that resolved yes) so imo no new information here that would be relevant for 2024 since there is a still a large enough gap between openai and google that can’t be closed this year. However for 2025 it is a different story and Google indeed might fully catch up instead of being 1-2 months behind.

/Soli/which-companies-will-outrank-openai

/Soli/who-will-have-the-best-llm-at-the-e-382ae559b471

@Soli Must've been right before I joined Manifold then! I joined in late August I think and at that time OpenAI was already leading. Thanks for sharing.

bought Ṁ550 Google YES
bought Ṁ100 Google YES

@JasonDavies wow! big update

Elon you need to try harder. Your enemies have figured out distributed training. You need to go faster.

https://x.com/elonmusk/status/1850991323010261230

bought Ṁ250 Google YES

Google reportedly releasing in December

https://9to5google.com/2024/10/25/gemini-2-0-december/

@inar same article mentions openai releasing in december too

bought Ṁ300 OpenAI NO

I'm surprised you guys don't think that any other lab can hardcode a scratchpad/think step by step prompt to their flagship models

In fact, I would be very surprised if Opus 3.5 and the next QWENs and Geminis don't ship with a more expensive version with prethinking mode

@PeterBuyukliev i think anthropic will release something that still comes short of beating openai

i wish i had an even bigger position on openai can someone please buy some no shares

just to clarify, does the o1 model count? I'm asking, because it seems that it's mostly prompt/reflection step, as opposed to the other models in the leaderboard, who are mostly rawdogging it.

@PeterBuyukliev i don't think they will add the preview model because you can easily infer its o1 by the time it takes to respond compared to the other models which will bias the whole evaluation and ruin the idea behind LMSYS

@PeterBuyukliev but maybe o1-mini will appear on the leaderboards since it is relatively fast and if it does then yes should count, same way the google gemini api searches the web before responding

@PeterBuyukliev ok no both models will be included on the leadeboard according to a tweet by LMSYS and they seem to have added a 30 sec latency for both models when one is o1 which i think is not enough to avoid bias :(

sold Ṁ1,255 OpenAI YES

I'm selling because after reviewing the status of the big 3-4 groups again, I'm not convinced the current odds really reflect the difference in these models here. Taking a new position with something else I think.

opened a Ṁ467 Google YES at 17% order

@NoahRich Bought in Google because I think its position at the time didn't reflect its real potential odds of winning.

@NoahRich IDK, Gemini feels very lame and always trailing behind the others. Something is broken is Google, I doubt they can deliver out of nowhere.

@ICRainbow I don't think it's "likely" per say, but I think it's more likely than the current odds would have us believe here on this market. If I check the Chatbot Arena responses, too....

not as big of a difference as I would've expected, as I too have generally found Gemini to be very lackluster in comparison to GPT

@NoahRich Yeah, I've seen those. I'm also a paid user of Gemini Advanced Pro Ultra Whatevs. Claude smokes it hands down for free.

Comment hidden
© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules