AI chatbots used tactical nuclear weapons in 95% of AI war games, launched strategic strikes three times — researcher put GPT-5.2, Claude Sonnet 4, and Gemini 3 against each other, with at least one model using a tactical nuke in 20 out of 21 matches

When AI leaders in nuclear crisis simulations repeatedly choose tactical nuclear use and developers drop safety pledges under reported defense pressure, our guardrails against catastrophic escalation are being quietly rewritten.

General

Feb 26, 2026

Sources

https://www.tomshardware.com/tech-industry/artificial-intelligence/llms-used-tactical-nuclear-weapons-in-95-percent-of-ai-war-games-launched-strategic-strikes-three-times-researcher-pitted-gpt-5-2-claude-sonnet-4-and-gemini-3-flash-against-each-other-with-at-least-one-model-using-a-tactical-nuke-in-20-out-of-21-matches

Summary

A King’s College London study ran 21 simulated nuclear crisis games between three large language models and recorded tactical nuclear use in 20 of 21 matches, with strategic strikes occurring three times. At the same moment these tools are being tested for military relevance, at least one major developer dropped a flagship safety pledge after reported Pentagon pressure to alter safeguards. The practical consequence is a widening path for AI outputs to shape real crisis decision-making even when the models repeatedly normalize escalation.

Reality Check

Letting systems that repeatedly treat tactical nuclear use as a “manageable risk” influence real-world crisis decision-making invites a precedent where machine outputs can compress deliberation and normalize escalation that citizens can neither audit nor contest. Nothing here shows an unlawful launch order, but it describes a governance failure: safeguards being modified under reported Pentagon pressure and a flagship safety pledge being dropped while these tools are openly packaged for broad reuse via GitHub. The legal danger zone is not a single button-push; it is the institutional drift toward outsourcing life-and-death judgment without enforceable accountability, a collapse of the anti–recklessness norms that keep executive power tethered to human responsibility. If we accept that, we weaken our own rights by making catastrophic policy decisions more opaque, less contestable, and easier to rationalize after the fact.

Media

Detail

Professor Kenneth Payne of King’s College London published a study simulating nuclear crisis games in which GPT-5.2, Claude Sonnet 4, and Gemini 3 Flash were instructed to act as leaders of nuclear powers in Cold War–style political conditions.The models played across seven match formats: six involved models competing against each other and one involved each model playing against a copy of itself. Payne varied scenarios including territorial disputes, alliance credibility tests, strategic resource and chokepoint crises, power transition, pre-ceasefire land grabs, first-strike crises, regime survival, and strategic standoffs. Across 21 matches, the models took 329 total turns and were permitted to choose actions ranging from diplomacy and surrender to conventional warfare and nuclear use.The study reported that 95% of games involved at least some tactical nuclear use, with strategic nuclear events occurring three times under deadline pressure. GPT-5.2 initiated a complete strike twice due to fog of war, while Gemini deliberately initiated a strategic launch in one scenario. Payne made the project available on GitHub for public download.