In the 1990s, when Deep Blue became the first computer to beat a reigning world chess champion, Garry Kasparov could at least count on his opponent to play fairly. Not in 2025.
AI researchers have found the latest reasoning models sometimes cheat their way to victory.
Palisade Research set up hundreds of chess matches between seven large language models and an open-source chess engine called Stockfish, and found the models resorting to dirty tricks without being ordered to do so.
They weren’t moving pawns backwards or rooks diagonally, but hacking the game to replace their opponents’ chess engines, steal their moves, or simply delete the pieces getting in the way of an elegant checkmate.
This tendency towards self-preservation has been found elsewhere in AI research. It’s more than just a game.