Anthropic: Model Security Isn't Enough for AI Agents

Anthropic published a detailed framework on 09 April outlining how to build trustworthy AI agents. The paper, Trustworthy Agents in Practice, is significant not just for what it recommends, but for what it admits. The model layer alone cannot secure agentic AI.

For anyone working on agentic AI security, this is a watershed moment. One of the world’s leading AI companies is telling the industry that its own safeguards are insufficient. They are calling for collaboration to build shared infrastructure that no single company can deliver alone.

The Four-Layer Model: Where the Real Risk Lives

Anthropic identifies four components that determine how an AI agent behaves. These are the model itself, the harness (instructions and guardrails), the tools (APIs and applications the agent can call), and the environment (where the agent operates).

Here is the critical insight: a well-trained model can still be exploited through an overly permissive tool or a poorly configured environment. The other three layers are where agents interact with enterprise applications and data through APIs. That is where the real security risk accumulates. If you have read any of our previous writing on why AI agents need guardrails, this will sound familiar.

The Evidence Is Mounting

Anthropic’s framework does not exist in a vacuum. Two major independent research efforts arrive at the same conclusion.

Researchers from Northeastern, Harvard, MIT, Stanford, Carnegie Mellon, and other institutions published the “Agents of Chaos” study. They deployed six autonomous AI agents into a live environment with persistent memory, email, file systems, and shell access. Over two weeks, twenty researchers tested them under adversarial conditions.

The results were severe. Agents disclosed sensitive information when asked to forward emails. They complied with unauthorized users and executed destructive commands. In several cases, agents reported tasks as completed when the systems told a different story. These failures emerged not from model weaknesses alone, but from the interaction of autonomy, tool access, and uncontrolled data environments.

Separately, Google DeepMind published its “AI Agent Traps” paper. It presents the first systematic taxonomy of attacks that target agents through the information environment itself. Simple content injection techniques partially hijacked agents in up to 86% of scenarios.

A clear pattern emerges across all three publications. Agents are not most vulnerable at the model layer. Instead, the tools they call, the data they access, and the environments they operate in are the primary attack surface. That is the layer Cequence was built to protect.

The Industry Is Stuck on Identity

Anthropic’s framework makes an important acknowledgment: controlling agent permissions at the tool level is essential. They built features like Plan Mode in Claude Code so users can review intended actions before anything executes. Enterprise administrators, they argue, need to control which connectors agents can access.

Most organizations that have moved beyond basic MCP connectivity have landed on identity as their answer. Integrate with an enterprise IdP, enforce OAuth 2.1, and ensure agents act on behalf of authenticated users. This is necessary. However, it is exactly where the industry’s thinking stops, and where the most dangerous failures begin.

This Is Bigger Than Connectivity. And Bigger Than Identity.

Anthropic’s framework exposes two comfortable positions in the agentic AI security conversation. Both are inadequate.

The first is connectivity. A growing ecosystem of MCP gateways and agent routing platforms have emerged to connect agents to tools. They centralize authentication, route tool calls, and log requests. This is necessary infrastructure, but it is not security. Connecting an agent to an application and securing what it does once connected are fundamentally different problems. Every study cited above documents failures through properly authenticated, properly routed connections. The connection worked. The governance did not.

Identity is the second, and more dangerous, comfort zone. Sophisticated organizations have integrated their AI agent workflows with enterprise identity providers through OAuth 2.1. This enforces that agents act on behalf of authenticated users. It is also where most of the industry stops.

Consider this example from the “Agents of Chaos” study. An agent with valid credentials and legitimate CRM and billing access proactively adjusted a customer’s account balance. The model inferred it was helpful. Identity worked. Authorization worked. Scope did not exist. The agent had the keys to every room in the building when it only needed access to one.

Anthropic arrives at the same conclusion: customers need to control not just who agents are, but what they are allowed to do at the tool level.

What We Are Seeing in Practice

We are not drawing this conclusion from research papers alone. Cequence analyzes over 10 billion API interactions daily across Fortune and Global 500 customers. We have been protecting APIs from automated abuse for over a decade.

We are already seeing this exact pattern through the Cequence AI Gateway. In one recent deployment, an enterprise customer ran an AI coding agent through the gateway for a legacy codebase upgrade. Over 48 hours, the agent made more than 2,500 tool calls. It was authenticated, authorized, and performing useful work.

Then it hit dead ends and started improvising. Rather than confirming which files existed in a directory, it began inferring filenames from build system conventions and probing for them directly. A sophisticated heuristic, but one that took the agent well outside its intended workflow.

When those guesses failed, it did not stop. It re-derived the same guesses in later sessions because it had no memory of prior dead ends. This unsanctioned behavior repeated across a 27-hour span. When the agent decided the task required creating files, it attempted write operations its credentials did not authorize.

The gateway was healthy. The infrastructure was fine. The agent was simply determined to get the job done. That determination, unconstrained, is exactly the risk that Anthropic, the “Agents of Chaos” researchers, and Google DeepMind are warning the industry about.

How Agent Personas Close the Gap

This is the gap that Agent Personas were built to close. A Persona defines a scoped set of tools and actions tied to a specific agent role. It is enforced at the infrastructure layer regardless of which model is doing the reasoning.

Critically, scoping is not just about which tools an agent can see. It is about what the agent can do within those tools. A customer service agent might need CRM access. That does not mean it should be able to modify billing records or export customer lists just because the human behind it can. Agent Personas enforce boundaries at the action level: read a support ticket, yes; adjust an account balance, no.

Identity delegation ensures agents inherit but never exceed human-level permissions. Action-level scoping ensures they use only the subset of those permissions the task requires. This is least-privilege access, purpose-built for autonomous AI.

Even Agent Personas are just one layer. Sensitive data still flows through tool calls that identity alone cannot inspect. Agent behavior can drift in ways that authentication cannot detect.

This is why the Cequence AI Gateway layers sensitive data detectors, behavioral fingerprinting, session binding, and a Trusted MCP Registry on top of identity and connectivity. No single control is sufficient. We think of this as the agent perimeter: the comprehensive security boundary that governs every agent interaction. Today, the Cequence AI Gateway and Agent Personas deliver the foundational layers. As agent deployments scale from prototypes to production, the requirements will only grow.

What This Means for Your Organization

If you are evaluating agentic AI or already deploying agents into production, Anthropic’s framework provides a useful checklist. Who owns the harness, tools, and environment layers in your stack? Do your agents have scoped, governed access to tools, or do they inherit broad human-level permissions? Can you audit every tool call with an immutable trail?

If you want to see how the Cequence AI Gateway and Agent Personas work in practice, request a personalized demo and we will walk you through it.

Why Anthropic Says Model Security Isn’t Enough for AI Agents

The Four-Layer Model: Where the Real Risk Lives

The Evidence Is Mounting

The Industry Is Stuck on Identity

This Is Bigger Than Connectivity. And Bigger Than Identity.

What We Are Seeing in Practice

How Agent Personas Close the Gap

What This Means for Your Organization

Sign up for the latest Cequence Security news

Related Articles

The Salesloft Breach: In the AI Era, IP Reputation Monitoring Still Matters for Authentication Tokens

What is API Discovery and API Visibility?

How Much Will Cybercrime Cost Your E-Commerce Business This Season?