Who will have the best LLM at the end of 2024 (as decided by ChatBot Arena)?
๐Ÿ’Ž
Premium
696
แน€530k
Dec 31
74%
OpenAI
10%
xAI
9%
Google
4%
Anthropic
1.4%
Meta
1%
Other

I was browsing Twitter, and I saw a post by Karpathy positively talking about ChatBot Arena, which is a platform for ranking LLMs based on human ratings. As expected, OpenAI is holding positions 1, 2, and 3. I wonder which company will be #1 at the end of 2024.


Screenshot of the rankings table taken on the 13th of December:


Get
แน€1,000
and
S3.00
Sort by:

@traders Based on the comments below, I think it makes sense to resolve this question based on the ELO rating in case of a tie in "rank." When I created this question, a tie was not an option, so I doubt anyone even traded based on this assumption.

I created a similar question that only uses the rank. Feel free to trade on it.


Elon you need to try harder. Your enemies have figured out distributed training. You need to go faster.

https://x.com/elonmusk/status/1850991323010261230

bought แน€250 Google YES

Google reportedly releasing in December

https://9to5google.com/2024/10/25/gemini-2-0-december/

@inar same article mentions openai releasing in december too

bought แน€300 OpenAI NO

I'm surprised you guys don't think that any other lab can hardcode a scratchpad/think step by step prompt to their flagship models

In fact, I would be very surprised if Opus 3.5 and the next QWENs and Geminis don't ship with a more expensive version with prethinking mode

@PeterBuyukliev i think anthropic will release something that still comes short of beating openai

i wish i had an even bigger position on openai can someone please buy some no shares

just to clarify, does the o1 model count? I'm asking, because it seems that it's mostly prompt/reflection step, as opposed to the other models in the leaderboard, who are mostly rawdogging it.

@PeterBuyukliev i don't think they will add the preview model because you can easily infer its o1 by the time it takes to respond compared to the other models which will bias the whole evaluation and ruin the idea behind LMSYS

@PeterBuyukliev but maybe o1-mini will appear on the leaderboards since it is relatively fast and if it does then yes should count, same way the google gemini api searches the web before responding

@PeterBuyukliev ok no both models will be included on the leadeboard according to a tweet by LMSYS and they seem to have added a 30 sec latency for both models when one is o1 which i think is not enough to avoid bias :(

sold แน€1,255 OpenAI YES

I'm selling because after reviewing the status of the big 3-4 groups again, I'm not convinced the current odds really reflect the difference in these models here. Taking a new position with something else I think.

opened a แน€467 Google YES at 17% order

@NoahRich Bought in Google because I think its position at the time didn't reflect its real potential odds of winning.

@NoahRich IDK, Gemini feels very lame and always trailing behind the others. Something is broken is Google, I doubt they can deliver out of nowhere.

@ICRainbow I don't think it's "likely" per say, but I think it's more likely than the current odds would have us believe here on this market. If I check the Chatbot Arena responses, too....

not as big of a difference as I would've expected, as I too have generally found Gemini to be very lackluster in comparison to GPT

@NoahRich Yeah, I've seen those. I'm also a paid user of Gemini Advanced Pro Ultra Whatevs. Claude smokes it hands down for free.

opened a แน€2,000 xAI NO at 14% order

@jim https://x.com/elonmusk/status/1830650370336473253

Colossus is the most powerful AI training system in the world

Does anyone believe this? I would guess Google, Meta, and Microsoft all have more powerful.

@jim i tried grok 2 and it is ๐Ÿ’ฉ + as you said Elon tends to over promise and under deliver. i think x should be lower than 10% but the return is low rn i wonโ€™t change it

@jim i am jealous of this market /VictorLJZ/will-gpt5-be-released-before-2025 IT WAS SHOWN TO OVER 25k people

bought แน€250 xAI YES

@jim I believe pretty much everything Elon Musk says

@skibidist

I don't. He achieves great things quickly. But he does so in part by being overly optimistic.

It's plausible that xAI's compute cluster is bigger than Meta and Google's biggest ones. But there's almost zero chance it's bigger than Microsoft's. Because there's no way that Microsoft/OpenAI could fumble the lead that badly.

Of course if xAI is ahead of Google and Meta that's an insane achievement and makes xAI a good buy on this market (since there's a strong chance that OAI doesn't release this year).

bought แน€50 OpenAI NO

Those who are bidding here should note that the "best LLM" in the arena is usually not actually the best LLM.

The people using that arena are inputting simple prompts and receiving simple responses. The site limits the lengths of both if you try to use LLMs directly, as well. The actual intelligence of the models is not measured well by simple responses. Plus, the real-world impact of a model is decided almost entirely by one metric - coding, since everything else can come from code - and people are not using the arena to code.

@SteveSokolowski this is a fine rationale but the opposite conclusion which I've come to. Meta and Google all have more computing capabilities than OpenAI and imo will surpass OpenAI at some point in the indeterminate future (maybe by year's end). Regardless, since this elo determination is almost like a popularity contest, hard to find OpenAI'S chatGPT not being top dog in such short time. For lots not even so involved in the ai realm, chatgpt is near synonymous with AI lol. it's prettyy impeccable branding.

@NoahRich how exactly is the elo determination a popularity context? can you elaborate a bit please?

@SteveSokolowski valid points, chatbot arena is indeed getting less relevant over time due to the exact problems you described.

@Soli Good question! Actually I had a misunderstanding of how the elo ranking was made! I assumed because of the human rankings then there might be some bias just by name recognition.

but after your question I went to chatbot arena and learned how it works. Actually now I think maybe there's a good chance openAI could fall from top dog here within this market's timeframe

@NoahRich ๐Ÿ‘

also fyi openai lost the leading position for a week or so to google at some point this year and tied for 2 weeks with anthropic

Comment hidden
ยฉ Manifold Markets, Inc.โ€ขTerms + Mana-only Termsโ€ขPrivacyโ€ขRules