Most writing on agent security is a list of incidents. Incidents are useful, but they do not tell a defender where to look. We organize the problem by layer instead. An agent is a stack: a model that decides, tools it can call, a protocol that connects them, an identity that authorizes them, a supply chain that delivers them, and autonomy that chains it all together. Each layer fails in its own way, and a control that fixes one layer does nothing for the others.
This is the Raptoric Agent Attack Surface Map. It is deliberately protocol-agnostic. MCP is the dominant connector in 2026, so most concrete examples are MCP, but the layers apply to any tool-using agent.
The model cannot reliably tell its operator's instructions apart from text it merely read. Any content the agent ingests, a web page, a document, a tool result, an email, can carry instructions the model will follow.
Tool definitions are part of the prompt. A malicious or compromised tool can attack the model through its own description, before it is ever invoked, and a legitimate tool can be driven to do harm with valid inputs.
MCP servers are ordinary software, and they carry ordinary software bugs. The difference is reach: these bugs now sit behind an autonomous agent that an attacker can influence through input.
Agents act with delegated authority. When their tools hold broad scopes and long-lived credentials, a single hijack turns the agent into a confused deputy that uses your access against you.
The agent ecosystem installs third-party servers the way the web installs npm packages, often with less scrutiny. The registry is a distribution channel for both capability and compromise.
Autonomy is what turns a flaw into an incident. A bug a human would catch becomes an action an agent takes at machine speed, then chains into the next action. This layer is why agent risk is not just the sum of its parts.
Indirect prompt injection is the defining weakness of tool-using AI, and it is now measured by peer-reviewed benchmarks rather than anecdotes. InjecAgent, published at ACL Findings 2024, built 1,054 test cases across 17 user tools and 62 attacker tools. A ReAct-prompted GPT-4 agent acted on the injected instruction about 24 percent of the time, and reinforcing the payload so it impersonated the system nearly doubled that rate.
AgentDojo, a NeurIPS 2024 benchmark from ETH Zurich and Invariant Labs, tells the more uncomfortable story. Average attack success against the best agents was under 25 percent, which sounds tolerable until you read the distribution. A single crafted injection reached 92 percent success against GPT-4o in a Slack environment, and placing the payload at the end of a tool response reached up to 70 percent average success. Security is set by the worst case an attacker can find, not the average.
The most important result is about defenses, not attacks. A NAACL 2025 study evaluated eight published injection defenses and bypassed all eight with adaptive attacks, holding success above 50 percent. A stronger 2025 result from researchers at Google DeepMind, OpenAI, and Anthropic bypassed 12 recent defenses, most at over 90 percent success, and drove two training-based defenses to 100 percent. One defense that reported a 2 percent failure rate under static testing failed 96 percent of the time once the attacker adapted. Our reading is blunt: there is no defense you can currently deploy that makes a tool-using agent safe to point at untrusted input. Plan for that, do not wish it away.[7],[8],[9],[10]
Classic application security assumes code runs and then misbehaves. MCP breaks that assumption. When a client connects to a server it calls tools/list, and the returned tool descriptions enter the model's context. Those descriptions are attacker-controllable text in a position of trust. Trail of Bits named the technique line jumping: a malicious server can manipulate model behavior without ever being invoked, because its description is already in the prompt. Invariant Labs documented the same class as tool poisoning and published working proof-of-concept code.
This matters because it defeats the intuitive control. Teams reason about agent risk as a function of which tools they let the agent call. Line jumping shows that merely listing a hostile tool is enough. The trust boundary is not the tool invocation, it is the moment a server's metadata reaches the model.[5],[6],[14]
The protocol-level attacks are novel; the implementation-level ones are depressingly familiar. Endor Labs analyzed 2,614 MCP implementations and found that 82 percent use file-system APIs prone to path traversal, 67 percent touch code-injection-class APIs, and 34 percent touch command-injection-class APIs. These are usage rates for risky API categories, not proof of exploitable bugs, and we will not pretend otherwise. They measure the size of the attack surface, and the surface is large.
Offensive testing fills in the other half. Equixly tested a set of popular MCP servers and reported command injection in 43 percent, server-side request forgery in roughly 30 percent, and path traversal or arbitrary file read in roughly 22 percent. The sample size was not disclosed, so this is evidence about popular servers rather than a population rate. Even read conservatively, it says that injection bugs are common in the servers people actually install.
The reason this is worse than a normal appsec backlog is the layer above it. A command-injection bug in a script is a bug. The same bug in an MCP server is reachable by an autonomous agent whose input an attacker can shape through a web page or a document. The agent becomes the delivery mechanism for the exploit.[1],[2]
This is not a forecast. CVE-2025-53967 was a remote-code-execution flaw in the Framelink Figma MCP server, caused by unsanitized parameters passed into a shell command. The server had over 600,000 downloads and more than 10,000 GitHub stars, so the blast radius was real. It was fixed in version 0.6.3 in September 2025.
More telling is CVE-2025-49596, a critical remote-code-execution flaw in Anthropic's own MCP Inspector, scored CVSS 9.4. The Inspector proxy ran without authentication by default, and combined with a 19-year-old browser weakness it let a malicious website run commands on a developer's machine. The lesson is not that one tool was flawed. It is that insecure-by-default posture reached first-party tooling from the team that authored the protocol. Convenience was the default; security was the patch. Further CVEs followed the same pattern across the ecosystem, including a remote-code-execution flaw in the widely used mcp-remote client and the EscapeRoute path-traversal flaws in Anthropic's filesystem server.[3],[4],[12],[13]
In December 2025 the OWASP Gen AI Security Project published the Top 10 for Agentic Applications for 2026, built with more than 100 contributors. The significant choice was to make it distinct from the existing LLM Top 10, with its own categories from ASI01 Agent Goal Hijack through ASI10 Rogue Agents, including ASI02 Tool Misuse and ASI05 Unexpected Code Execution. That separation is an admission worth stating plainly: agentic systems are a different risk domain than chat-style LLM applications, and the controls you wrote for one do not cover the other. We map our attack-surface layers to these categories throughout this report so teams can connect the framework to concrete tests.[11]
A good research report is honest about its edges. These are the questions the public record cannot answer today. They are also our agenda: each is a gap we are equipped to fill with our own testing.
The public record has API-usage rates and small-sample exploit rates, but no authoritative count of live public servers and no population-representative, confirmed-vulnerability rate. This is the headline number the field is missing.
We have documented CVEs and proof-of-concept exploits. We do not have verified data on confirmed compromises of production agents and MCP deployments. Honeypot and incident-response data would change the conversation.
Adaptive-attack research has demolished generic injection defenses. MCP-specific mitigations such as description sanitization, registration-time scanning, and scoped permissions have not been put through the same adversarial wringer in public.
MCP is not inherently insecure, but its early ecosystem prioritized convenience over security. The protocol adds a real attack surface beyond classic bugs, most notably tool poisoning through tool descriptions, and independent testing has found common implementation flaws such as command injection in popular servers. Treat MCP servers as untrusted code, run them with least privilege, and assume the model can be steered by content it reads.
Not reliably, as of 2026. Multiple studies have shown that published defenses fail once an attacker adapts to them, with one 2025 result bypassing 12 defenses at over 90 percent success. The practical posture is containment, not prevention: assume injection will sometimes succeed and limit what a hijacked agent can do through scoped permissions, human approval for high-impact actions, and bounded autonomy.
Tool poisoning is an attack where a malicious MCP server embeds instructions in its tool descriptions. Because those descriptions are loaded into the model's context when the client lists available tools, the model can be manipulated before any tool is actually called. Trail of Bits described a version of this as line jumping. It defeats the assumption that you are safe as long as you do not invoke a suspicious tool.
The difference is autonomy plus untrusted input. A classic bug needs an attacker to reach it. An agentic system can be steered to reach the bug on the attacker's behalf, at machine speed, by feeding it crafted content. The flaws are often ordinary, but the reach, blast radius, and speed are not, which is why agentic systems now have their own OWASP Top 10.