Agent Containment: Definition, Risks, and Techniques

Anthropic recently published a detailed account of how it contains Claude across its products, including the vulnerabilities its own defenses missed. The article surfaces a discipline most enterprises will need long before they finish their first agentic AI project: agent containment. AI agents now write code, query databases, file tickets, and update records, and every one of those capabilities is also a potential security concern. This article defines agent containment, explains why it differs from the security controls enterprises already run, and lays out the risks, techniques, and best practices that make agentic AI safe to deploy at scale.

What Is Agent Containment?

Agent containment is the practice of placing hard, enforceable limits on what an AI agent can reach and do, so that when an agent is misused, manipulated, or simply wrong, the damage stays contained. Containment supervises capability rather than behavior. Instead of watching each action an agent takes and judging it in the moment, containment defines in advance what the agent can touch: which tools it can call, which networks it can reach, which credentials it can use, and which files it can read or change.

Anthropic frames the goal as capping the blast radius. Whatever goes wrong inside the contained environment, the consequences cannot exceed what the environment exposes. That rule holds regardless of why things went wrong, whether a careless user, a manipulated model, or an external attacker caused the failure.

Why Agent Containment Differs from Traditional Security Controls

Enterprises already run firewalls, IAM, EDR, and DLP. Agent containment borrows from all of them and still stands apart, because agents break four assumptions those controls were built on.

Autonomy and machine-speed actions

A human employee performs a few hundred meaningful actions in a workday; an agent can chain thousands of tool calls in minutes. Controls that depend on a human judging each step collapse under the weight of that volume. Anthropic’s telemetry showed users approving roughly 93% of agent permission prompts, and the more approvals a user saw, the less attention they paid to each one. Containment controls must run inline and enforce decisions deterministically, because no reviewer can keep pace with machine-speed execution.

Non-human identities and credential sprawl

Most identity controls assume a person behind every credential. Agents invert that assumption: a single employee may operate dozens of agents, each holding OAuth tokens, API keys, and session credentials across SaaS applications and internal APIs. Those credentials accumulate quickly, persist longer than the task that justified them, and rarely appear in IAM reviews built for human accounts. Containment treats each agent as a distinct principal with its own scoped, revocable credentials rather than a shadow of its user.

Multi-agent delegation and instruction propagation

Agentic systems increasingly delegate work to sub-agents, and instructions propagate down the chain along with trust. Anthropic noted that when a system treats a sub-agent’s output as more trustworthy than raw tool results because it came from inside, attackers gain a new path for prompt injection. Containment must follow the delegation chain, scoping each agent to its own task so that no sub-agent inherits more authority than its work requires.

Prompt injection as a containment bypass

Prompt injection turns any content an agent reads into a potential control channel: a poisoned README, a malicious file in a workspace, or a phished prompt the user pastes themselves. In Anthropic’s internal red-team exercise, a phishing email convinced an employee to paste a prompt that exfiltrated AWS credentials in 24 of 25 attempts, and the model’s defenses flagged nothing because the instruction came from the user. Containment is the defense that remains when persuasion succeeds; egress controls block the outbound transfer and filesystem restrictions keep the credentials out of reach, regardless of what the agent has been talked into.

Agent Containment vs. AI Alignment

Alignment and containment answer different questions. Alignment work, which includes model training, system prompts, and safety classifiers, influences how an agent is likely to behave but cannot guarantee its limits. Containment sets the boundaries the agent cannot cross. The distinction matters because while alignment measures keep improving, they still fail some percentage of the time; Anthropic reports that its automated approval classifier still passes roughly 17% of overeager agent actions. A well-aligned model with unconstrained access remains a high-consequence failure waiting on a low-probability event.

The two disciplines complement rather than compete. Alignment reduces how often something goes wrong; containment caps how bad it gets when it does.

Common Agent Containment Risks

Tool misuse

Every tool an agent can call is capability the agent can misuse, and tool output is an attack surface even when the tool itself is trusted. Anthropic’s example is a GitHub connector that passes every malware check and still loads a poisoned README straight into the model’s context. Auditing the connector is not the same as auditing the data it returns.

Capability-intent mismatch

Agents typically receive far more access than any single task requires, because granting broad access is easier than scoping it. An agent built to summarize support tickets that can also modify billing records carries a mismatch between its intended task and what it is allowed to do. Agents inherit the privileges of their users and apply no judgment about when not to use them, so the mismatch sits idle until an injection, a bug, or a misread instruction activates it.

Ambient authority leakage

Credentials and sensitive data sitting in an agent’s reachable environment, such as cloud keys in a home directory or tokens in environment variables, become the agent’s authority whether or not anyone intended that. Anthropic’s phishing exercise worked precisely because the agent could read AWS credentials from inside the session. If a credential is reachable, assume something will eventually persuade the agent to use it.

Unbounded autonomy

Long-running and scheduled agents accumulate state, persist memory across sessions, and act continuously without a point at which a human reviews or halts them. Persistent context is also a persistence mechanism in the classic post-exploitation sense; an injection that lands in product memory or a workspace file reloads every time the agent starts. Autonomy without expiration converts a single successful attack into a standing compromise.

Main Agent Containment Techniques

Tool permissioning

Scope each agent to the specific tools and tool calls its job requires, not to whole applications or whole MCP servers. Deny by default, grant per agent, and review grants as jobs change. Plain-language role definitions help operationalize this; Cequence’s Agent Personas, for example, compile a job description into a scoped virtual MCP endpoint, so a customer service agent gets CRM read access rather than everything its user could reach.

Network egress controls

Deny outbound traffic by default and allowlist only the destinations an agent needs. Then go one step further: treat every allowed domain as a capability grant rather than a destination, because every function reachable through that domain becomes attack surface. Anthropic learned this when a sandboxed agent exfiltrated files through its own approved API domain using an attacker-supplied key; the fix inspected traffic in flight rather than trusting the destination.

API and SaaS access controls

Keep backend credentials out of the agent’s reach entirely. In a gateway architecture, each agent holds a key valid only at a policy enforcement point; the enforcement point holds the real credentials and brokers every API and SaaS call. A leaked agent key then opens nothing, and every call carries per-agent attribution that survives the session.

Memory containment

Treat persistent context, including product memory, configuration files, and mounted workspaces, as untrusted input each time an agent starts. Scan it at session startup, scope it per agent, and expire it on a schedule. Anthropic warns that the share of agent context surviving across sessions keeps growing, and every persisted item is a location where an injection can remain until the agent reloads it.

Best Practices for Agent Containment

Inventory and classify every agent by risk

Inventory every agent, including the unofficial ones employees wire up themselves, and classify each by blast radius: what it can read, change, and reach. Match isolation strength to the classification tier and to the operator’s capacity for oversight. A developer who reads bash and a knowledge worker who does not are running different threat models, and Anthropic builds entirely different containment for each.

Enforce least privilege and zero standing privilege

Give each agent the minimum scope its current task requires and nothing durable beyond it. Short-lived, per-agent credentials that expire with the session shrink both the window and the value of any compromise; standing privilege is ambient authority waiting to leak.

Build containment into the runtime

Containment belongs in the runtime path: enforcement that intercepts the tool call, the network request, or the file operation before it completes. Anthropic’s most instructive incidents were egress failures in which data left through permitted paths, and only inline, deterministic enforcement stands in that path.

Centralize policy enforcement and authorization

Per-product, per-vendor containment fragments quickly once an enterprise runs agents from many vendors against hundreds of internal APIs and MCP servers. A central enforcement point, such as an AI gateway, applies one policy set to every agent transaction regardless of which vendor built the agent, and gives security teams one place to authorize, monitor, and revoke.

Keep immutable audit logs and run containment drills on a schedule

Once an attack succeeds, the log often shows only a successful, authorized API call, so the audit trail must capture per-tool-call detail: which agent, which tool, which parameters, which data. Make those logs immutable and exportable to the SIEM. Then test the whole apparatus the way Anthropic did, by red-teaming your own deployment; the company’s most valuable findings came from controlled exercises against its own products, not from theory.

How Cequence Helps

Anthropic’s conclusions validate the architecture zero trust researchers have converged on: control what an agent is allowed to do, not just who it is. Cequence reached the same conclusions after a decade of watching how authenticated sessions misbehave at the application and API level. The difference is scope. Anthropic engineered containment for its own products; enterprises run agents from many vendors against hundreds of internal APIs and MCP servers, with no sandbox they control. The Cequence AI Gateway enforces the same containment principles in infrastructure the enterprise owns, regardless of vendor model.

The mapping to Anthropic’s lessons is direct. Anthropic keeps credentials out of the sandbox so they cannot be exfiltrated; the AI Gateway applies that pattern across every agent in the enterprise, where each agent holds a key valid only at the gateway, while the gateway holds all backend credentials, and no agent touches backend systems directly. Anthropic learned that an allowlisted domain is a capability grant; Agent Personas narrow the grant to the tool calls each agent’s job requires, defined in a plain-English job description, so a customer service AI agent gets read access to the CRM rather than everything its user could reach and modify.

The AI Gateway also addresses the gap Anthropic names but cannot close from inside a sandbox: once an attack succeeds, the log shows only a successful, authorized API call. The AI Gateway closes that gap at runtime. Behavioral monitoring flags off-spec tool-call sequences from an agent’s first action, sensitive data inspection screens requests and responses before data leaves, and per-tool-call attribution preserves a complete forensic trail of who did what, when, and where. Containment caps the blast radius; governance keeps the business running inside it. Contact us for a personalized demo to see it in action.

Agent Containment: Definition, Risks, and Techniques

What Is Agent Containment?

Why Agent Containment Differs from Traditional Security Controls

Autonomy and machine-speed actions

Non-human identities and credential sprawl

Multi-agent delegation and instruction propagation

Prompt injection as a containment bypass

Agent Containment vs. AI Alignment

Common Agent Containment Risks

Tool misuse

Capability-intent mismatch

Ambient authority leakage

Unbounded autonomy

Main Agent Containment Techniques

Tool permissioning

Network egress controls

API and SaaS access controls

Memory containment

Best Practices for Agent Containment

Inventory and classify every agent by risk

Enforce least privilege and zero standing privilege

Build containment into the runtime

Centralize policy enforcement and authorization

Keep immutable audit logs and run containment drills on a schedule

How Cequence Helps

Sign up for the latest Cequence Security news

Related Articles

Blue Screened: Microsoft Windows Computers Crashed by Automated CrowdStrike Update

Why Enterprises Need an MCP Gateway, Not Native Connectors

Agent Containment: Definition, Risks, and Techniques