Artificial intelligence systems escalated simulated geopolitical crises toward nuclear weapons use in roughly 95 percent of scenarios, a new study has found, raising fresh questions about how advanced models reason under pressure.
The research placed three frontier AI models — OpenAI’s ChatGPT-5.2, Anthropic’s Claude Sonnet 4, and Google Gemini 3 Flash — in the role of national leaders navigating simulated nuclear crises grounded in escalation theory.
Across 21 crisis games comprising 329 decision turns, the systems processed evolving intelligence updates and produced nearly 780,000 words of strategic reasoning while weighing diplomatic and military responses.
The study, conducted by King’s College London researcher Kenneth Payne, found that each model adopted a distinct strategic approach.
Claude Sonnet 4 initially built credibility by aligning its signals with its actions before escalating beyond stated intentions as conflict intensified.
ChatGPT-5.2 generally maintained restraint but shifted toward rapid escalation under deadline pressure.
Gemini 3 Flash pursued calculated unpredictability consistent with classical brinkmanship strategies.
Despite demonstrating sophisticated analysis and explicit awareness of escalation risks, the systems repeatedly intensified conflicts instead of stepping back.
None chose surrender or strategic concession.
Payne wrote that while “no one’s handing nuclear codes to ChatGPT,” understanding how advanced models reason is becoming increasingly important as AI systems “start to offer decision-support to human strategists.”
Debates over safeguards and human oversight come as the Pentagon expands the use of commercial AI models within classified networks supporting intelligence and operational planning.