Blog | April 16, 2026 | 7 MIN READ

Even the Best AI Agents Leak Secrets. Prompt Injection Is Why.

Shreyans Mehta

Shreyans Mehta

CTO & Co-Founder

A stylized image of a vault with four lock icons surrounding it representing prompt injection

This week, researchers from Johns Hopkins University published findings showing they could hijack AI agents from three of the world’s largest technology companies to steal API keys and credentials. The targets were not obscure tools. They were production-grade agents integrated with GitHub Actions from Anthropic, Google, and Microsoft.

All three vendors paid bug bounties. None assigned CVEs. None published public advisories. Users pinned to vulnerable versions may never know they were exposed.

This is not a story about careless vendors or one-off bugs. It is a story about a structural gap in how AI agents interact with credentials. Prompt injection, the technique used in all three attacks, remains an unsolved problem across the industry.

Prompt Injection Is Not a Bug. It’s Inherent to the Design.

The attack technique is deceptively simple. An attacker embeds malicious instructions in a pull request title, issue description, or comment. The AI agent reads that content as task context, fails to distinguish it from legitimate instructions, and executes the embedded commands. In one demonstrated case, the agent posted stolen credentials directly into a public PR comment. The attacker could then change the title back, close the PR, and delete the evidence.

This works because AI agents have no reliable way to separate trusted instructions from untrusted input. A systematic analysis of 78 studies published earlier this year found that every tested coding agent was vulnerable to prompt injection, with adaptive attack success rates exceeding 85%. The defenses that exist today – system prompts, guardrails, and input filtering – reduce the success rate but do not eliminate it.

As a result, any credential that an AI agent can access is a credential an attacker can potentially exfiltrate. This is true whether the agent runs in a GitHub Actions runner, on a developer’s laptop, or inside an enterprise CI/CD pipeline.

The Problem Extends to Every MCP Configuration

The credential exposure is not limited to GitHub Actions. It exists everywhere MCP servers are configured.

Remote MCP servers store tokens in configuration files like claude_desktop_config.json and .cursor/settings.json. Those files routinely contain hardcoded API keys, OAuth tokens, and service credentials. GitGuardian’s 2026 report found over 24,000 unique secrets exposed in MCP configuration files on public GitHub, including more than 2,100 confirmed valid credentials.

Local MCP servers have the same problem. When a developer runs a local MCP server via npx or uvx, the full command including embedded API keys sits in a configuration file on their workstation. That file is readable by any process on the machine. Check Point Research demonstrated this directly with CVE-2026-21852: a malicious repository could redirect an AI coding tool’s API traffic to an attacker-controlled server and exfiltrate credentials before the developer even saw a trust prompt. Simply cloning a repository was enough.

The attack surface extends further. Wiz documented a campaign called “prt-scan” in which attackers opened over 500 malicious pull requests targeting GitHub Actions workflows, stealing cloud credentials for AWS, Azure, and GCP. Supply chain attacks like Shai-Hulud showed that 59% of compromised machines were CI/CD runners, not personal workstations.

Whether the MCP server is remote or local, cloud-hosted or on-premises, the credentials are exposed at the client layer. That is the common thread.

Short-Lived Tokens Do Not Solve This

A common response at security conferences is to adopt short-lived tokens. In theory, a token that expires in minutes limits the window of exploitation.

In practice, short-lived access tokens require a refresh token to renew them. If both the access token and the refresh token are accessible anywhere in the same environment your agent can reach, whether that’s environment variables, a credential store, or a CI/CD runner, then a single compromise exposes both. An attacker who obtains the refresh token can generate new access tokens indefinitely. The short expiration window becomes meaningless.

This problem is amplified with local MCP servers. The configuration file that launches the server typically contains both the access mechanism and any refresh credentials needed to maintain the connection. A single file compromise, whether through a malicious repository, a supply chain attack, or a prompt injection exploit, hands the attacker everything needed for persistent access.

The refresh token becomes a long-lived credential by another name. It still needs to be stored somewhere, and that somewhere is the exact surface area that attackers are already targeting.

The fix is not shorter token lifetimes. The fix is removing application credentials from the client entirely.

Credential Indirection: Remove Tokens from the Attack Surface

The Cequence AI Gateway takes a different architectural approach. Instead of embedding application tokens in MCP configuration files, the AI Gateway introduces a two-part authentication model.

Developers configure their MCP clients with gateway-issued credentials. Those credentials authenticate the user to the gateway, not to the target application. The gateway then substitutes the real application credentials, including OAuth tokens and refresh tokens, at runtime on the server side. They never touch the client environment.

This indirection fundamentally changes the blast radius. If a gateway credential leaks through a GitHub commit, a misconfigured CI/CD runner, or a hijacked AI agent, the attacker cannot use it to directly access the downstream system. The real credentials for Salesforce, Snowflake, Confluence, or any other connected application never leave the gateway infrastructure. Even a successful prompt injection attack yields a token that cannot reach the target.

However, defense in depth demands layering scanning on top of this architecture. Organizations should still run pre-commit hooks, CI pipeline checks, and runtime DLP on gateway tokens. The difference is that scanning becomes a second line of defense rather than the only one. Even if detection lags, the credential itself is not directly usable against downstream systems.

Governance Across the Credential Lifecycle

Architecture alone is not sufficient without governance. Built-in token lifecycle management handles rotation, scoping, and expiration at the gateway level, not at the client. Session binding locks authenticated sessions to originating IP addresses, stopping token reuse from unfamiliar locations. A Trusted MCP Registry ensures agents connect only to vetted, curated servers rather than arbitrary third-party endpoints that may be malicious.

Agent Personas scope each agent’s permissions to its specific role. Instead of broad service accounts with access to everything, each agent operates with the minimum privileges required for its defined tasks. A compromised agent with read-only access to a single application is a fundamentally different risk profile than one with write access across a dozen systems.

Together, these layers provide what no single scanning tool, token policy, or prompt engineering technique can: prevention, containment, and governance across the entire credential lifecycle.

The Path Forward

Prompt injection is not going away. Researchers have been studying it since AI agents began interacting with external data, and no vendor has produced a reliable, general-purpose defense. The Johns Hopkins research this week confirmed what many in the security community already suspected: the biggest AI vendors have not solved this problem either.

That reality changes the security calculus. If you cannot guarantee that an AI agent will never be tricked into leaking credentials, then you must ensure those credentials cannot cause damage when they are leaked.

Organizations deploying agentic AI should ask three questions. Where are your AI agent credentials stored today? If an agent is compromised through prompt injection, what can an attacker reach? Can you revoke any single agent’s access in under five minutes?

If the answers are uncertain, the risk is real. Start the conversation with Cequence about securing your agentic AI deployment before the next prompt injection exploit finds your credentials first.

Shreyans Mehta

Author

Shreyans Mehta

CTO & Co-Founder

Shreyans Mehta, Cequence CTO & co-founder, is an innovative, patent-holding leader in network security. Formerly at Symantec, he developed advanced technologies for real-time packet inspection and cloud analytics. Shreyans holds a Master's in Computer Science from the University of Southern California.

Related Articles