Which of these Language Models will beat me at chess?
💎
Premium
23
Ṁ18k
2101
85%
Any model announced before 2030
77%
Any model announced before 2029
71%
Any model announced before 2028
61%
Any open-weights model announced before 2030
57%
Any model announced before 2027
42%
Any model announced before 2026
38%
GPT-5
31%
DeepSeek-V4
30%
OpenAI o3
24%
Grok 3
22%
Llama 4
22%
Claude 3.5 Opus

Which of these models will beat me at chess once released? Resolves YES if they win, NO if I win, and 50% for a draw.

I'm rated about 1900 FIDE. When each of these models are released, I'll play a game of chess with them at a rapid time control. On each move, I'll provide them with the game state in PGN and FEN notation. If the models make three illegal moves, they lose. Responses like Nbd2 vs. Nd2 will not count towards this.

Each option will stay open until the model is released, or it will resolve N/A if it's clear that the model will never be released. I'll periodically add models to this market which I find interesting. Once I play a game, I'll post the PGN in the comments before resolving. Multiple answers can resolve YES.

  • Update 2025-14-01 (PST) (AI summary of creator comment): - Model Type: Only general language models are being considered; chess-specific models are excluded.

    • Capabilities: The model must be able to output human languages and code.

Get
Ṁ1,000
and
S3.00
Sort by:
bought Ṁ500 NO

Current models are... not close. Illegal moves remain a big problem.
https://www.youtube.com/watch?v=FojyYKU58cw

@AbuElBanat this was published a year ago before the release of o1. In the game that I played against it, o1 played badly but only made one illegal move.

@mr_mino embarrassing oversight. Thanks.

Would a chess specific model count?

@AdamCzene No, I only plan on adding general LLMs. At a minimum the model should also be able to output human languages and code.

FYI latest LLMs are trained on data without chess games because such specific token data degrades performance on other important tasks

@mathvc if this were true, wouldn’t you expect them not to be able to play chess at all? How do you explain o1 playing a full game of chess given only FEN and PGN inputs?

I recently played a game against o1, which I won. o1 made several blunders in this game, I'd estimate its elo to be less than 1000 FIDE. Here is the PGN:

1. e4 e5 2. Nf3 Nc6 3. Bb5 Nf6 4. O-O Nxe4 5. Re1 Nd6 6. Nxe5 Nxe5 7. Rxe5+ Be7 8. d4 Nxb5 9. c4 Nd6 10. c5 Nc4 11. Re2 O-O 12. b3 Na5 13. Nc3 d6 14. Bf4 Bg4 15. Nd5 Bxe2 16. Qxe2 Nc6 17. cxd6 Bxd6 18. Rd1 Re8 19. Bxd6 Rxe2 20. Nxc7 Qxd6 21. Nb5 Rae8 22. Nxd6 Re1+ 23. Rxe1 Rxe1#

© Manifold Markets, Inc.Terms + Mana-only TermsPrivacyRules