Will a large language model beat a super grandmaster playing chess by 2028?
Basic
1k
403k
2029
50%
chance

If a large language models beats a super grandmaster (Classic elo of above 2,700) while playing blind chess by 2028, this market resolves to YES.

I will ignore fun games, at my discretion. (Say a game where Hiraku loses to ChatGPT because he played the Bongcloud)

Some clarification (28th Mar 2023): This market grew fast with a unclear description. My idea is to check whether a general intelligence can play chess, without being created specifically for doing so (like humans aren't chess playing machines). Some previous comments I did.

1- To decide whether a given program is a LLM, I'll rely in the media and the nomenclature the creators give to it. If they choose to call it a LLM or some term that is related, I'll consider. Alternatively, a model that markets itself as a chess engine (or is called as such by the mainstream media) is unlikely to be qualified as a large language model.


2- The model can write as much as it want to reason about the best move. But it can't have external help beyond what is already in the weights of the model. For example, it can't access a chess engine or a chess game database.

I won't bet on this market and I will refund anyone who feels betrayed by this new description and had open bets by 28th Mar 2023. This market will require judgement.

Get Ṁ600 play money
Sort by:

Anyone else thinking about just making an LLM beat a super gm now? Could just fine tune a llama model with a multi-format regime to help generalise… just train against stock fish even

Which chess player will you be signing up?

Does LLM mean auto regressive without search in this context?

That was what I was assuming.

@MP Would you resolve YES if someone made an LLM to be a gaming master. That is, it's made and marketed to be a master at all board games, and typically outputs game states instead of conversation. Similar to how code-llms are still LLMs by trade, where they are expected to output code instead of conversational language.

Does the market resolve to No or N/A if a super GM doesnt play one of the latter models?

Resolved no

resolves NO

bought Ṁ10 YES

i think the “chess engine” part is difficult here. a fine tuned llm should be okay here. since, i think the main issue is you need a power model (e.g. llama3 or gpt-4) but fine tuned so it knows to play “good chess”. otherwise, i’m concerned the rlhf step / instruct step just won’t let you easily access the behavior you want.

so, i’m 90% if fine tuning is allowed without it counting as an engine and 20%-30% if it is not.

bought Ṁ500 NO

anyone who votes yes is either a booster or a mark. either way you don't understand either chess or LLMs.

bought Ṁ100 YES from 49% to 50%

@LiamThomas Someone got GPT-3.5-instruct to play at an 1800 level, do you think it's that much harder for one to get to 2800?

@dominic Not only is it much harder to get to 2800, GPT can't even reliably not make illegal moves.

@LiamThomas The thing I am referring to is this: https://x.com/grantslatton/status/1703913578036904431?s=46&t=b2S4mwbwk2fTmKFQAmEEsg

where he says it never made illegal moves. Yes 2800 is much tougher than 1800, but 2028 is pretty far away and this is GPT-3.5.

The difference between a 1800 and a 2800 is the difference between a good club player and Magnus Carlsen.

Also, people immediately broke it and made it start making illegal moves.

@LiamThomas Sure, it's a big difference, a good club player would never beat Magnus Carlsen. But you also can't expand the brain of the good club player 100x, or train them on billions of chess games, the way you can with LLMs. The illegal moves thing I expect to be solved with smarter models and I don't think represents a fundamental limitation (I would find it pretty hard to not make illegal moves personally if all I was given was a list of moves and couldn't see a board).

@dominic I don't think playing chess should be particularly easier for a LLM than any other intellectual task. If a LLM can surpass peak human chess playing, it can likely surpass peak human ability at any intellectual task, and we'll have reached the singularity at that point. Which may be true. But I think it'd be very unlikely.

@dominic i did the same and had it playing against bestfish on lichess and it held its own for 28 moves.

@dominic Don’t forget to the question is not “is this possible?”, but “will this happen?” One of a relatively small pool of people have to agree to a game meeting very specific conditions

@JimHays i feel a cheater on chess.com using an llm to beat a supergm would qualify

@LiamThomas Comparing humans as if they occupy a meaningful range on the universal scale feels silly. The gap in speed between me and Usain Bolt is dwarfed by the jet engine.

@JimHays That's a good point re: will it happen, but I'd only expect it to be an issue if it turns out that by 2028 the only program that can do it is some obscure fine-tune from an anonymous Twitter user. If GPT-7 or whatever can do it, probably someone will play it at some point. Even in the worst case, there's probably a way to pay, like, $50 and get a game set up against one of the streamer players.

@dominic Almost everyone seems to be overlooking that it has to be “blind” chess

@JimHays quite honestly it’s hard to interpret exactly what this means in the context of an llm other than something about how the multiturn is supposed to work

@CampbellHutcheson But it’s not so hard to interpret what it means for the human

@JimHays oh i assumed it only applies to the model and basically was suggesting it needs to be multiturn one move at a time rather than the whole position in each pass

It is a pleasure to be your mark, if that's how you feel about this arrangement

More related questions