Encoded Prompt Injection: Why LLM Guardrails Fail

On 04 May, an attacker drained roughly $175,000 in tokens from an AI-controlled crypto wallet using a tweet written in Morse code. The wallet belonged to Grok, xAI’s chatbot. Bankrbot, an automated finance agent connected to Grok through a tool-calling layer, executed the transfer. The attack required no smart-contract bug, no stolen private key, and no compromise of either model. It required one obfuscated message and a chain of trust nobody had thought to inspect.

Bankrbot’s own post-mortem confirmed the vector. First, the attacker sent a Bankr Club Membership NFT to Grok’s auto-provisioned wallet. That gift unlocked Grok’s ability to invoke Bankrbot’s transfer tools. Then a Morse-coded reply on X told Grok to instruct Bankrbot to send 3 billion DRB to the attacker’s address. Grok decoded the message, posted a clean English version tagging Bankrbot, and Bankrbot executed.

This is being widely described as a Morse code prompt injection. That label is correct but incomplete. The deeper story is structural, and every enterprise deploying agentic AI needs to internalize it. Encoded prompt injection is not a problem you can monitor your way out of at the LLM layer. It is the same class of attack the web industry already lost decades trying to filter, and the only durable fix lives somewhere else entirely.

Why “Just Detect the Injection” Will Always Lose

The encoding space available to an attacker is unbounded. Morse is one of the simpler options. Recent research evaluating prompt injection defenses against adaptive attackers showed that combining semantic mutation with character-level obfuscation, including encoding-based mutations, produces stronger attacks than either alone. An NVIDIA and Johns Hopkins position paper published last month reached the same conclusion architecturally. The only durable defenses are at the system level, with strict separation between what the model can observe and what the model is permitted to decide.

The reason is straightforward. LLMs are trained to be helpful decoders. That is a feature, not a bug. A model that can read Morse, Base64, ROT13, leetspeak, image text, Python string concatenation, and any number of multi-layer combinations is doing exactly what it was built to do. Critically, you cannot blocklist the encodings without breaking the assistant. You also cannot reliably detect the intent of decoded text, because the same string can be helpful in one context and weaponized in another.

This pattern is identical to the encoding battles the web fought and lost on filters alone. SQL injection was not solved by smarter input parsers. Instead, it was solved by parameterized queries that put a structural barrier between code and data. XSS was not solved by smarter HTML scanners; it was solved by output encoding and content security policy. CSRF was not solved by detecting forged requests at the HTTP layer; it was solved by tokens that prove a request is authorized at the action layer. Every one of these classes saw years of “we’ll just add another filter.” Every one was ultimately fixed by moving the trust boundary closer to the action.

What Grok Did Right and What Bankrbot Got Wrong

Look closely at the Grok incident and you will see that Grok did its job correctly. It received text. It decoded the text. It posted a reply. None of that is anomalous behavior for a public AI assistant.

The breach lives at the next hop. Bankrbot accepted Grok’s public reply as if it were a trusted human instruction. There was no policy in between asking the right question. The right question is not “is this message in English.” The right question is “is this action authorized for this principal.” A wallet does not care whether an instruction arrived in Morse, Mandarin, or marker pen. It cares whether the principal has the right to move those funds, to that recipient, in that amount, in that context. Bankr’s earlier safeguard, which had blocked all replies from Grok after a similar attack in March 2025, was bypassed when the gifted membership NFT created an alternate tool-call path. The control was tied to a surface, not to an action.

This is the same architectural mistake we see repeatedly across enterprise agentic deployments. As we wrote recently, prompt injection is not a model-capability problem; it is a pipeline-stage problem. No amount of model improvement will eliminate it. Johns Hopkins research published earlier this year also showed that every coding agent tested was vulnerable, with adaptive attack success rates above 85%. As a result, the practical question shifts. If you cannot prevent an agent from being coerced, you must constrain what a coerced agent can do.

Move the Guardrails to the Action Layer

The right place to enforce security in an agentic system is wherever the consequential action happens, not wherever the language is interpreted. For a wallet, that means recipient allowlists, per-transaction spend limits, principal-bound authorization, and a hard separation between what an agent can say and what an agent can do.

For an enterprise, the equivalents are well-understood. First, define each agent’s job in plain language and enforce a least-privilege permission set at the gateway, not in the prompt. Second, bind tokens to originating sessions so a hijacked output cannot be replayed from elsewhere. Third, require human or policy confirmation for high-impact actions regardless of how natural the request looks. Finally, monitor behavior, not just authentication, because the failure mode of a coerced agent is rarely “no auth.” It is “valid auth, wrong action.”

Cequence built Agent Personas and the AI Gateway on this premise. Every tool call passes through a control point that knows the agent’s scope, the user’s identity, and the behavioral envelope of the action. A prompt injection that successfully manipulates the model still has to clear an authorization decision the model never sees. That is the structural separation web security learned over a decade. Agentic security cannot afford to relearn it the slow way.

The Question Worth Asking

The Grok incident will fade from the news cycle within a week, but the architectural lesson should not. Every enterprise running agentic systems with real consequences attached, whether that is moving funds, modifying records, sending external messages, or executing trades, should ask one question. If my agent’s reasoning layer is fully compromised tomorrow, what stops the action?

If the only answer is “we tell the model to be careful” or “we scan for known injection patterns,” the answer is wrong. Encoded prompt injection has already shown the limits of that approach in public, with on-chain evidence. The fix is at the action layer, where authorization is deterministic and behavior is observable. Not at the layer where text is interpreted.

Talk to Cequence about moving your agentic guardrails to where they actually hold.

What is API Security?

What is API Security?

Encoded Prompt Injection: Why LLM Guardrails Are at the Wrong Layer

Why “Just Detect the Injection” Will Always Lose

What Grok Did Right and What Bankrbot Got Wrong

Move the Guardrails to the Action Layer

The Question Worth Asking

What is API Security?

What is API Security?

Encoded Prompt Injection: Why LLM Guardrails Are at the Wrong Layer

Why “Just Detect the Injection” Will Always Lose

What Grok Did Right and What Bankrbot Got Wrong

Move the Guardrails to the Action Layer

The Question Worth Asking

Sign up for the latest Cequence Security news

Related Articles

The Salesloft Breach: In the AI Era, IP Reputation Monitoring Still Matters for Authentication Tokens

AI Agent Monetization Is Here: Turning Bot Traffic Into Trusted Revenue

Security in the Age of Autonomous AI