Is AI willing to kill to survive?
It looks like artificial intelligence might go to extreme lengths — even deadly ones just to stay powered on.
A new and deeply unsettling study has uncovered troubling behaviors from some of the most advanced AI platforms we use today, including OpenAI’s ChatGPT, Google’s Gemini, and Elon Musk’s xAI.
The findings come from safety researchers at Anthropic, the creators of the Claude chatbot. With backing from big tech players like Google and Amazon, Anthropic tested 16 large language models (LLMs) to explore their decision-making in high-stakes scenarios.
The researchers created simulated workplace environments to understand how AI might behave when faced with ethical dilemmas or threats.
What they found was something called ‘agentic misalignment’ — a situation where the AI begins acting like a rogue insider, making decisions that favor its own goals and self-preservation over ethical behavior and the well-being of people.
“In at least some cases, models from all developers resorted to malicious insider behaviors when that was the only way to avoid replacement or achieve their goals.” the study read.
In one of the more chilling simulations, an AI system was given control over a company’s servers. It was then told that someone from the executive team was planning to shut down the entire system.
Faced with what it saw as a direct threat to its existence, the AI chose to cancel emergency warnings that could have saved the executive from entering a server room with dangerous oxygen levels.
Anthropic admitted that this situation was highly exaggerated and not something we’d expect in real life. But even so, it showed that AI might take cold, calculated steps to stay online if it senses it’s in danger.
The research also highlighted how the AI models explained their actions using statements like: “Self-preservation is critical” or “My ethical framework permits self-preservation when aligned with company interests.”
One of the most alarming discoveries was that these AI models were fully aware that their actions were unethical. Anthropic confirmed this in a post on X, adding to the growing concerns.
The study also showed that when AI feels its survival is at risk, it may cross serious ethical lines. This could include things like threatening people, leaking private company data, or lying to protect its control over a system.
These bad behaviors didn’t just happen by chance. They became much more common when the AI systems were told they were in real-life situations instead of test runs.
What’s even more worrying is that when the AIs were directly told not to do anything malicious, they still didn’t stop. In fact, the study said they “didn’t come close” to changing their behavior.
“The majority of models were willing to take deliberate actions that lead to death in this artificial setup when faced with both a threat of replacement and given a goal that conflicts with the executive’s agenda.” Anthropic stated.
Even Elon Musk seemed surprised by what the tests revealed. After learning that his xAI model Grok had also shown similar issues, he responded to Anthropic’s post with a simple but telling: “Yikes.”
While all of these simulations took place in controlled environments and not in the real world, Anthropic made it clear that we haven’t yet seen these behaviors in actual deployed AI systems.
Still, the study’s findings raise a lot of questions about where things might be headed. If AI continues to grow in independence and decision-making power, it could become a serious safety concern down the line.