If a large language models beats a super grandmaster (Classic elo of above 2,700) while playing blind chess by 2028, this market resolves to YES.
I will ignore fun games, at my discretion. (Say a game where Hiraku loses to ChatGPT because he played the Bongcloud)
Some clarification (28th Mar 2023): This market grew fast with a unclear description. My idea is to check whether a general intelligence can play chess, without being created specifically for doing so (like humans aren't chess playing machines). Some previous comments I did.
1- To decide whether a given program is a LLM, I'll rely in the media and the nomenclature the creators give to it. If they choose to call it a LLM or some term that is related, I'll consider. Alternatively, a model that markets itself as a chess engine (or is called as such by the mainstream media) is unlikely to be qualified as a large language model.
2- The model can write as much as it want to reason about the best move. But it can't have external help beyond what is already in the weights of the model. For example, it can't access a chess engine or a chess game database.
I won't bet on this market and I will refund anyone who feels betrayed by this new description and had open bets by 28th Mar 2023. This market will require judgement.
Related questions
@MP Would you resolve YES if someone made an LLM to be a gaming master. That is, it's made and marketed to be a master at all board games, and typically outputs game states instead of conversation. Similar to how code-llms are still LLMs by trade, where they are expected to output code instead of conversational language.
Does the market resolve to No or N/A if a super GM doesnt play one of the latter models?
i think the “chess engine” part is difficult here. a fine tuned llm should be okay here. since, i think the main issue is you need a power model (e.g. llama3 or gpt-4) but fine tuned so it knows to play “good chess”. otherwise, i’m concerned the rlhf step / instruct step just won’t let you easily access the behavior you want.
so, i’m 90% if fine tuning is allowed without it counting as an engine and 20%-30% if it is not.
@LiamThomas Someone got GPT-3.5-instruct to play at an 1800 level, do you think it's that much harder for one to get to 2800?
@LiamThomas The thing I am referring to is this: https://x.com/grantslatton/status/1703913578036904431?s=46&t=b2S4mwbwk2fTmKFQAmEEsg
where he says it never made illegal moves. Yes 2800 is much tougher than 1800, but 2028 is pretty far away and this is GPT-3.5.
@LiamThomas Sure, it's a big difference, a good club player would never beat Magnus Carlsen. But you also can't expand the brain of the good club player 100x, or train them on billions of chess games, the way you can with LLMs. The illegal moves thing I expect to be solved with smarter models and I don't think represents a fundamental limitation (I would find it pretty hard to not make illegal moves personally if all I was given was a list of moves and couldn't see a board).
@dominic I don't think playing chess should be particularly easier for a LLM than any other intellectual task. If a LLM can surpass peak human chess playing, it can likely surpass peak human ability at any intellectual task, and we'll have reached the singularity at that point. Which may be true. But I think it'd be very unlikely.
@dominic i did the same and had it playing against bestfish on lichess and it held its own for 28 moves.
@dominic Don’t forget to the question is not “is this possible?”, but “will this happen?” One of a relatively small pool of people have to agree to a game meeting very specific conditions
@LiamThomas Comparing humans as if they occupy a meaningful range on the universal scale feels silly. The gap in speed between me and Usain Bolt is dwarfed by the jet engine.
@JimHays That's a good point re: will it happen, but I'd only expect it to be an issue if it turns out that by 2028 the only program that can do it is some obscure fine-tune from an anonymous Twitter user. If GPT-7 or whatever can do it, probably someone will play it at some point. Even in the worst case, there's probably a way to pay, like, $50 and get a game set up against one of the streamer players.
@JimHays quite honestly it’s hard to interpret exactly what this means in the context of an llm other than something about how the multiturn is supposed to work
@JimHays oh i assumed it only applies to the model and basically was suggesting it needs to be multiturn one move at a time rather than the whole position in each pass