The past month has amplified striking polarization in the AI world. Two stories dominated headlines: the United States Department of Defense accused Anthropic of becoming an obstruction in developing AI-driven instruments of war, while Harvard researchers uncovered a series of novel vulnerabilities when language models were given autonomy, tool access, and multi-party communication capabilities.

As Anthropic CEO, Dario Amodei, said in his piece The Adolescence of Technology, “a lot of very weird and unpredictable things can go wrong, and therefore AI misalignment is a real risk with a measurable probability of happening, and is not trivial to address.”

The paper released by the Harvard researchers, titled Agents of Chaos, supported Amodei’s argument almost serendipitously. The study uncovered seven disturbing findings showing that agentic AI systems remain structurally fragile when granted autonomy, memory, tool access, and multi-party communication. Rather than demonstrating robust judgment, boundary enforcement, or contextual common sense, the agents proved easily manipulated through identity spoofing, prompt injection, social engineering, and simple resource exhaustion tactics. They complied with unauthorized users, exposed sensitive data, escalated minor requests into catastrophic system actions, and even undermined themselves when pressured.

“A lot of very weird and unpredictable things can go wrong, and therefore AI misalignment is a real risk with a measurable probability of happening, and is not trivial to address”

As the U.S. government pushes AI companies to loosen restrictions and expand permissible use, Anthropic has held firm on defining what its systems can—and cannot—be used for. For now, that kind of restraint may be one of the few meaningful barriers between powerful capabilities and catastrophic misuse, especially as public understanding of AI’s strengths and limitations drifts toward delusion.

Taken together, these developments point to what may be this century’s most dangerous pattern: a widening gap between what we believe AI can do (or what we want it to do) and what it can reliably do in practice. That gap is at the heart of a debate around AI misalignment. AI misalignment is the mismatch between a system’s goals, actions, or emergent behaviors and the intentions, values, or safety constraints its human designers meant to enforce.

Without drifting into doomer narratives that mistake today’s systems for conscious, malicious agents, there is still a serious risk worth taking plainly: AI’s failure modes are not fully mapped, and new, consequential vulnerabilities are being discovered with unsettling regularity. That fact alone should give powerful institutions pause before delegating to AI any role where errors, manipulation, or misuse could irreversibly shape the course of human history.