I was browsing Twitter, and I saw a post by Karpathy positively talking about ChatBot Arena, which is a platform for ranking LLMs based on human ratings. As expected, OpenAI is holding positions 1, 2, and 3. I wonder which company will be #1 at the end of 2024.
Screenshot of the rankings table taken on the 13th of December:
@traders Based on the comments below, I think it makes sense to resolve this question based on the ELO rating in case of a tie in "rank." When I created this question, a tie was not an option, so I doubt anyone even traded based on this assumption.
I created a similar question that only uses the rank. Feel free to trade on it.
Elon you need to try harder. Your enemies have figured out distributed training. You need to go faster.
Google reportedly releasing in December
i wish i had an even bigger position on openai can someone please buy some no shares
@PeterBuyukliev i don't think they will add the preview model because you can easily infer its o1 by the time it takes to respond compared to the other models which will bias the whole evaluation and ruin the idea behind LMSYS
@PeterBuyukliev but maybe o1-mini will appear on the leaderboards since it is relatively fast and if it does then yes should count, same way the google gemini api searches the web before responding
@PeterBuyukliev ok no both models will be included on the leadeboard according to a tweet by LMSYS and they seem to have added a 30 sec latency for both models when one is o1 which i think is not enough to avoid bias :(
@NoahRich Bought in Google because I think its position at the time didn't reflect its real potential odds of winning.
@NoahRich IDK, Gemini feels very lame and always trailing behind the others. Something is broken is Google, I doubt they can deliver out of nowhere.
@ICRainbow I don't think it's "likely" per say, but I think it's more likely than the current odds would have us believe here on this market. If I check the Chatbot Arena responses, too....
not as big of a difference as I would've expected, as I too have generally found Gemini to be very lackluster in comparison to GPT
@NoahRich Yeah, I've seen those. I'm also a paid user of Gemini Advanced Pro Ultra Whatevs. Claude smokes it hands down for free.
@jim https://x.com/elonmusk/status/1830650370336473253
Colossus is the most powerful AI training system in the world
Does anyone believe this? I would guess Google, Meta, and Microsoft all have more powerful.
@jim i tried grok 2 and it is ๐ฉ + as you said Elon tends to over promise and under deliver. i think x should be lower than 10% but the return is low rn i wonโt change it
@jim i am jealous of this market /VictorLJZ/will-gpt5-be-released-before-2025 IT WAS SHOWN TO OVER 25k people
I don't. He achieves great things quickly. But he does so in part by being overly optimistic.
It's plausible that xAI's compute cluster is bigger than Meta and Google's biggest ones. But there's almost zero chance it's bigger than Microsoft's. Because there's no way that Microsoft/OpenAI could fumble the lead that badly.
Of course if xAI is ahead of Google and Meta that's an insane achievement and makes xAI a good buy on this market (since there's a strong chance that OAI doesn't release this year).
Those who are bidding here should note that the "best LLM" in the arena is usually not actually the best LLM.
The people using that arena are inputting simple prompts and receiving simple responses. The site limits the lengths of both if you try to use LLMs directly, as well. The actual intelligence of the models is not measured well by simple responses. Plus, the real-world impact of a model is decided almost entirely by one metric - coding, since everything else can come from code - and people are not using the arena to code.
@SteveSokolowski this is a fine rationale but the opposite conclusion which I've come to. Meta and Google all have more computing capabilities than OpenAI and imo will surpass OpenAI at some point in the indeterminate future (maybe by year's end). Regardless, since this elo determination is almost like a popularity contest, hard to find OpenAI'S chatGPT not being top dog in such short time. For lots not even so involved in the ai realm, chatgpt is near synonymous with AI lol. it's prettyy impeccable branding.
@NoahRich how exactly is the elo determination a popularity context? can you elaborate a bit please?
@SteveSokolowski valid points, chatbot arena is indeed getting less relevant over time due to the exact problems you described.
@Soli Good question! Actually I had a misunderstanding of how the elo ranking was made! I assumed because of the human rankings then there might be some bias just by name recognition.
but after your question I went to chatbot arena and learned how it works. Actually now I think maybe there's a good chance openAI could fall from top dog here within this market's timeframe
@NoahRich ๐
also fyi openai lost the leading position for a week or so to google at some point this year and tied for 2 weeks with anthropic