If a large language models beats a super grandmaster (Classic elo of above 2,700) while playing blind chess by 2028, this market resolves to YES.
I will ignore fun games, at my discretion. (Say a game where Hiraku loses to ChatGPT because he played the Bongcloud)
Some clarification (28th Mar 2023): This market grew fast with a unclear description. My idea is to check whether a general intelligence can play chess, without being created specifically for doing so (like humans aren't chess playing machines). Some previous comments I did.
1- To decide whether a given program is a LLM, I'll rely in the media and the nomenclature the creators give to it. If they choose to call it a LLM or some term that is related, I'll consider. Alternatively, a model that markets itself as a chess engine (or is called as such by the mainstream media) is unlikely to be qualified as a large language model.
2- The model can write as much as it want to reason about the best move. But it can't have external help beyond what is already in the weights of the model. For example, it can't access a chess engine or a chess game database.
I won't bet on this market and I will refund anyone who feels betrayed by this new description and had open bets by 28th Mar 2023. This market will require judgement.
@dominic I think the more RLHFed the model is, the worse it is at chess. That's probably why 3.5 instruct is better than 4, 4o, and probably o1.
I might be wrong.
It should be better if the output is constrained to pgn format and fine tuned on stockfish analysis (available on lichess pgn file).
This already a transformer that’s at 2700 just predicting stockfish.
@BrandonNorman If you shit in a box and it beats a grandmaster, I for one will respect whatever you call it.
@RiskComplex We already have chess engines that can beat a grandmaster. The bet here is that specifically an LLM will do it.
@JS_81 fine-tune it to what? recall real chess games very well? supergms can do that too, and more besides
Never mind, I didn't fully read
@NeuralBets literally just strong priors and vibes given the ambitious nature of the market. you shouldn't put much stock in my bets cause i have zero contrarian expertise
@dlin007 oh. i thought it had something to do with that paper, since you replied with a trade to my comment.
@ismellpillows I don't think the market creator meant literal AGI. It would play chess well, by definition. But fair point.
I’m not really sure how AGI is defined. Current LLMs are “general”, can’t beat super GM, and not AGI, right? My understanding is that the market requires an LLM that maintains generality and can beat super GM. So, for example, if someone made an LLM that’s equal to GPT-4 in every way except super good at chess, that would qualify. But it still wouldn’t be AGI, right?
Anyway, the model in the paper isn’t general because it only plays chess
https://dynomight.net/chess/
> I can only assume that lots of other people are experimenting with recent models, getting terrible results, and then mostly not saying anything. I haven’t seen anyone say explicitly that only gpt-3.5-turbo-instruct
is good at chess. No other LLM is remotely close.
To be fair, a year ago, many people did notice that gpt-3.5-turbo-instruct
was much better than gpt-3.5-turbo
. Many speculated at the time that this is because gpt-3.5-turbo
was subject to additional tuning to be good at chatting.
@CaelumForder Then it would have be a Super AGI. It would have to model stockfish (or something like it) and that is precisely what LLMs do not do despite all the hope and hype. I'm not saying that it is definitely impossible but all the bizarre failures we see in LLMs today are due to them extrapolating outside their training data. Even if LLMs could beat stockfish they will never do it by predicting stockfish - To do that they would have to actually emulate stockfish and that would necessarily be much much less time efficient than stockfish so not be able to search as deeply - Moves in GM chess games are time limited. If an LLM were ever to beat computer program at chess it would be more like AlphaGo but even that would require something that doesn't exist in current LLMs. Of course beating a GM is easier than beating stockfish. I think the main problem here is the misunderstanding of the sense in which LLMs "predict" the next token.