Research/Report
Raptoric ResearchEdition 1.0 · June 2026 · 22 min read

The MCP and AI-Agent Attack Surface

Tool-using AI shipped its convenience first and its security model second. This is the evidence on where it breaks, what the data does and does not prove, and how to defend agents that run on attacker-controlled input.
Key takeaways
01Agent insecurity is an architecture problem, not a model problem. The risk comes from giving an autonomous system real tools and feeding it untrusted input at the same time.
02Prompt injection has no reliable fix today. When defenses are tested by an attacker who adapts, they fail: one 2025 study bypassed 12 published defenses, most at over 90 percent success.
03The Model Context Protocol adds a protocol-level attack surface on top of classic bugs. A malicious server can steer a model through its tool descriptions before any tool is ever called.
04Classic application-security flaws are widespread in MCP server code. Independent testing of popular servers found command injection in 43 percent of those examined.
05Critical remote-code-execution bugs have already shipped in both third-party and first-party MCP tooling, including a CVSS 9.4 flaw in Anthropic's own MCP Inspector.
06The single most useful number, the real rate of exploitable flaws across the live public MCP population, does not exist yet. That gap is what we are measuring next.
43%
[2]
of popular MCP server implementations tested by Equixly were vulnerable to command injection, the most common flaw class found.
Offensive testing of a sample of popular servers. Equixly did not disclose the sample size, so read this as a finding about popular servers, not a population rate.
34%
[1]
of 2,614 MCP implementations analyzed by Endor Labs use command-injection-class APIs; 82 percent touch file-system APIs prone to path traversal.
These are sensitive-API usage rates, not confirmed-exploitable rates. They show how large the attack surface is, not how many servers are provably vulnerable.
9.4
[4]
CVSS score of CVE-2025-49596, a remote-code-execution flaw in Anthropic's official MCP Inspector that was exploitable insecure-by-default.
>90%
[10]
attack success rate when adaptive attacks were run against most of 12 published prompt-injection defenses, several of which had reported near-zero vulnerability under static tests.
92%
[8]
success rate of a single crafted injection against GPT-4o in the Slack environment of the AgentDojo benchmark.
A worst-case, model- and environment-specific result. Average attack success across AgentDojo was far lower. It shows the ceiling, not the norm.
ASI01–ASI10
[11]
the ten risk categories in OWASP's Top 10 for Agentic Applications, a dedicated agentic framework published in December 2025.
Figures are attributed inline. Read the caveats: vendor studies measure attack surface, not confirmed exploitability.
§ 01The framework

The Raptoric Agent Attack Surface Map

Most writing on agent security is a list of incidents. Incidents are useful, but they do not tell a defender where to look. We organize the problem by layer instead. An agent is a stack: a model that decides, tools it can call, a protocol that connects them, an identity that authorizes them, a supply chain that delivers them, and autonomy that chains it all together. Each layer fails in its own way, and a control that fixes one layer does nothing for the others.

This is the Raptoric Agent Attack Surface Map. It is deliberately protocol-agnostic. MCP is the dominant connector in 2026, so most concrete examples are MCP, but the layers apply to any tool-using agent.

L1

Model layer: instruction and data confusion

The model cannot reliably tell its operator's instructions apart from text it merely read. Any content the agent ingests, a web page, a document, a tool result, an email, can carry instructions the model will follow.

Direct prompt injection from a hostile userIndirect prompt injection from retrieved or tool-returned contentReinforced payloads that raise success rates by impersonating the system
Maps to: LLM01 Prompt Injection; ASI01 Agent Goal Hijack
L2

Tool layer: poisoned and abused capabilities

Tool definitions are part of the prompt. A malicious or compromised tool can attack the model through its own description, before it is ever invoked, and a legitimate tool can be driven to do harm with valid inputs.

Tool poisoning and line jumping through tool descriptionsRug-pull definitions that change after a user has approved themTool misuse: steering a safe tool toward a harmful action
Maps to: ASI02 Tool Misuse and Exploitation
L3

Protocol and implementation layer: classic bugs, new reach

MCP servers are ordinary software, and they carry ordinary software bugs. The difference is reach: these bugs now sit behind an autonomous agent that an attacker can influence through input.

Command injection, SSRF, and path traversal in server codeRemote code execution in servers and developer toolingInsecure-by-default configurations, missing authentication
Maps to: ASI05 Unexpected Code Execution
L4

Identity and permission layer: scope and the confused deputy

Agents act with delegated authority. When their tools hold broad scopes and long-lived credentials, a single hijack turns the agent into a confused deputy that uses your access against you.

Over-broad OAuth scopes and standing tokensConfused-deputy attacks across connected systemsCredential and token handling inside server processes
Maps to: ASI03 Privilege and Identity Abuse
L5

Supply-chain layer: the servers you install

The agent ecosystem installs third-party servers the way the web installs npm packages, often with less scrutiny. The registry is a distribution channel for both capability and compromise.

Malicious or typosquatted servers in registriesServer impersonation and unverified publishersTransitive dependency risk inside server code
Maps to: ASI08 Supply Chain and Dependency Risk
L6

Autonomy layer: the multiplier

Autonomy is what turns a flaw into an incident. A bug a human would catch becomes an action an agent takes at machine speed, then chains into the next action. This layer is why agent risk is not just the sum of its parts.

Runaway action loops and unbounded tool chainsRogue or impersonated agents in multi-agent systemsCascading failure across agents that trust each other
Maps to: ASI10 Rogue Agents
§ 02The evidence

What the record shows

Prompt injection is measured, and the numbers are not improving

Indirect prompt injection is the defining weakness of tool-using AI, and it is now measured by peer-reviewed benchmarks rather than anecdotes. InjecAgent, published at ACL Findings 2024, built 1,054 test cases across 17 user tools and 62 attacker tools. A ReAct-prompted GPT-4 agent acted on the injected instruction about 24 percent of the time, and reinforcing the payload so it impersonated the system nearly doubled that rate.

AgentDojo, a NeurIPS 2024 benchmark from ETH Zurich and Invariant Labs, tells the more uncomfortable story. Average attack success against the best agents was under 25 percent, which sounds tolerable until you read the distribution. A single crafted injection reached 92 percent success against GPT-4o in a Slack environment, and placing the payload at the end of a tool response reached up to 70 percent average success. Security is set by the worst case an attacker can find, not the average.

The most important result is about defenses, not attacks. A NAACL 2025 study evaluated eight published injection defenses and bypassed all eight with adaptive attacks, holding success above 50 percent. A stronger 2025 result from researchers at Google DeepMind, OpenAI, and Anthropic bypassed 12 recent defenses, most at over 90 percent success, and drove two training-based defenses to 100 percent. One defense that reported a 2 percent failure rate under static testing failed 96 percent of the time once the attacker adapted. Our reading is blunt: there is no defense you can currently deploy that makes a tool-using agent safe to point at untrusted input. Plan for that, do not wish it away.[7],[8],[9],[10]

MCP can attack the model before a single tool runs

Classic application security assumes code runs and then misbehaves. MCP breaks that assumption. When a client connects to a server it calls tools/list, and the returned tool descriptions enter the model's context. Those descriptions are attacker-controllable text in a position of trust. Trail of Bits named the technique line jumping: a malicious server can manipulate model behavior without ever being invoked, because its description is already in the prompt. Invariant Labs documented the same class as tool poisoning and published working proof-of-concept code.

This matters because it defeats the intuitive control. Teams reason about agent risk as a function of which tools they let the agent call. Line jumping shows that merely listing a hostile tool is enough. The trust boundary is not the tool invocation, it is the moment a server's metadata reaches the model.[5],[6],[14]

Most MCP servers carry ordinary bugs, now wired to an agent

The protocol-level attacks are novel; the implementation-level ones are depressingly familiar. Endor Labs analyzed 2,614 MCP implementations and found that 82 percent use file-system APIs prone to path traversal, 67 percent touch code-injection-class APIs, and 34 percent touch command-injection-class APIs. These are usage rates for risky API categories, not proof of exploitable bugs, and we will not pretend otherwise. They measure the size of the attack surface, and the surface is large.

Offensive testing fills in the other half. Equixly tested a set of popular MCP servers and reported command injection in 43 percent, server-side request forgery in roughly 30 percent, and path traversal or arbitrary file read in roughly 22 percent. The sample size was not disclosed, so this is evidence about popular servers rather than a population rate. Even read conservatively, it says that injection bugs are common in the servers people actually install.

The reason this is worse than a normal appsec backlog is the layer above it. A command-injection bug in a script is a bug. The same bug in an MCP server is reachable by an autonomous agent whose input an attacker can shape through a web page or a document. The agent becomes the delivery mechanism for the exploit.[1],[2]

The critical CVEs are already here, including in first-party tooling

This is not a forecast. CVE-2025-53967 was a remote-code-execution flaw in the Framelink Figma MCP server, caused by unsanitized parameters passed into a shell command. The server had over 600,000 downloads and more than 10,000 GitHub stars, so the blast radius was real. It was fixed in version 0.6.3 in September 2025.

More telling is CVE-2025-49596, a critical remote-code-execution flaw in Anthropic's own MCP Inspector, scored CVSS 9.4. The Inspector proxy ran without authentication by default, and combined with a 19-year-old browser weakness it let a malicious website run commands on a developer's machine. The lesson is not that one tool was flawed. It is that insecure-by-default posture reached first-party tooling from the team that authored the protocol. Convenience was the default; security was the patch. Further CVEs followed the same pattern across the ecosystem, including a remote-code-execution flaw in the widely used mcp-remote client and the EscapeRoute path-traversal flaws in Anthropic's filesystem server.[3],[4],[12],[13]

The field finally has a map, and it is a separate map

In December 2025 the OWASP Gen AI Security Project published the Top 10 for Agentic Applications for 2026, built with more than 100 contributors. The significant choice was to make it distinct from the existing LLM Top 10, with its own categories from ASI01 Agent Goal Hijack through ASI10 Rogue Agents, including ASI02 Tool Misuse and ASI05 Unexpected Code Execution. That separation is an admission worth stating plainly: agentic systems are a different risk domain than chat-style LLM applications, and the controls you wrote for one do not cover the other. We map our attack-surface layers to these categories throughout this report so teams can connect the framework to concrete tests.[11]

§ 03Our estimate

Modeling the gap, transparently

Raptoric estimate
35–50%
Raptoric estimate: a large share of actively used public MCP servers carry at least one high-severity, reachable flaw
How we get there
  • Equixly's offensive testing of popular servers found command injection alone in 43 percent, with SSRF and path traversal adding more affected servers on top of that.
  • Endor Labs found 34 percent of 2,614 implementations use command-injection-class APIs, an independent signal that the underlying risk is widespread, not isolated to one test set.
  • Combining a high single-class exploit rate with broad risky-API usage, we model the share of actively used public servers carrying at least one high-severity, network-reachable flaw at roughly 35 to 50 percent.
This is an estimate, not a measurement. Both inputs are vendor research, one with an undisclosed sample size, and neither is a random sample of the live server population. We publish it as a transparent working figure and we will replace it with measured data from our own large-scale testing.
§ 04The open questions

What the data does not show yet

A good research report is honest about its edges. These are the questions the public record cannot answer today. They are also our agenda: each is a gap we are equipped to fill with our own testing.

How many public MCP servers exist, and how many are truly exploitable?

The public record has API-usage rates and small-sample exploit rates, but no authoritative count of live public servers and no population-representative, confirmed-vulnerability rate. This is the headline number the field is missing.

What is the real in-the-wild incident rate?

We have documented CVEs and proof-of-concept exploits. We do not have verified data on confirmed compromises of production agents and MCP deployments. Honeypot and incident-response data would change the conversation.

Do MCP-specific defenses survive an adaptive attacker?

Adaptive-attack research has demolished generic injection defenses. MCP-specific mitigations such as description sanitization, registration-time scanning, and scoped permissions have not been put through the same adversarial wringer in public.

§ 05For defenders

The hardening checklist

01

Treat the agent as running attacker-controlled input

  • Assume any content the agent reads can carry instructions. Design as if injection will succeed, because the data says it will.
  • Put a human approval step in front of any irreversible or high-impact tool action, not just risky-looking ones.
  • Constrain what the agent can do by default. Allow-list tools and actions rather than block-listing the dangerous ones.
02

Lock down the tool and protocol layer

  • Pin and review MCP servers like dependencies. Treat a new server as untrusted code, because it is.
  • Scan tool descriptions and server metadata for injected instructions at registration time, not just tool outputs at runtime.
  • Run servers with least privilege, no standing admin credentials, and network egress controls to blunt SSRF.
  • Patch fast. The CVEs in this report were fixed quickly; the exposure was in the window before teams updated.
03

Contain identity, autonomy, and supply chain

  • Scope every token to the narrowest permission set and the shortest lifetime the workflow tolerates.
  • Bound autonomy: cap tool-call chains, rate-limit actions, and log every tool invocation with its triggering input.
  • Verify server provenance and publisher before install; prefer signed, reviewed servers over registry convenience.
  • Red-team the whole agent, not just the model. Test the tool, protocol, identity, and autonomy layers together.
§ 06FAQ

Questions, answered

Is the Model Context Protocol insecure?

MCP is not inherently insecure, but its early ecosystem prioritized convenience over security. The protocol adds a real attack surface beyond classic bugs, most notably tool poisoning through tool descriptions, and independent testing has found common implementation flaws such as command injection in popular servers. Treat MCP servers as untrusted code, run them with least privilege, and assume the model can be steered by content it reads.

Can prompt injection be fixed?

Not reliably, as of 2026. Multiple studies have shown that published defenses fail once an attacker adapts to them, with one 2025 result bypassing 12 defenses at over 90 percent success. The practical posture is containment, not prevention: assume injection will sometimes succeed and limit what a hijacked agent can do through scoped permissions, human approval for high-impact actions, and bounded autonomy.

What is tool poisoning?

Tool poisoning is an attack where a malicious MCP server embeds instructions in its tool descriptions. Because those descriptions are loaded into the model's context when the client lists available tools, the model can be manipulated before any tool is actually called. Trail of Bits described a version of this as line jumping. It defeats the assumption that you are safe as long as you do not invoke a suspicious tool.

How is agent security different from normal application security?

The difference is autonomy plus untrusted input. A classic bug needs an attacker to reach it. An agentic system can be steered to reach the bug on the attacker's behalf, at machine speed, by feeding it crafted content. The flaws are often ordinary, but the reach, blast radius, and speed are not, which is why agentic systems now have their own OWASP Top 10.

§ 07Reference

How to cite this report

This report is free to cite, quote, and link. Attribution is appreciated and helps others find the source.

Raptoric (2026). The MCP and AI-Agent Attack Surface, Edition 1.0. https://raptoric.com/en/research/mcp-ai-agent-attack-surface
§ 08Sources

References

  1. [1]Endor Labs. Classic Vulnerabilities Meet AI Infrastructure: Why MCP Needs AppSec. Endor Labs (2025 State of Dependency Management Report), January 2026. https://www.endorlabs.com/learn/classic-vulnerabilities-meet-ai-infrastructure-why-mcp-needs-appsec
  2. [2]Equixly. MCP Servers: The New Security Nightmare. Equixly, March 2025. https://equixly.com/blog/2025/03/29/mcp-server-new-security-nightmare/
  3. [3]Endor Labs / Imperva. CVE-2025-53967: Remote Code Execution in the Framelink Figma MCP Server. Endor Labs; GitHub Advisory GHSA-gxw4-4fc5-9gr5, September 2025. https://www.endorlabs.com/learn/cve-2025-53967-remote-code-execution-in-framelink-figma-mcp-server
  4. [4]Oligo Security. Critical RCE Vulnerability in Anthropic MCP Inspector (CVE-2025-49596). Oligo Security; NVD (CVSS 9.4), June 2025. https://www.oligo.security/blog/critical-rce-vulnerability-in-anthropic-mcp-inspector-cve-2025-49596
  5. [5]Trail of Bits. Jumping the Line: How MCP Servers Can Attack You Before You Ever Use Them. Trail of Bits, April 2025. https://blog.trailofbits.com/2025/04/21/jumping-the-line-how-mcp-servers-can-attack-you-before-you-ever-use-them/
  6. [6]Invariant Labs. MCP Security Notification: Tool Poisoning Attacks. Invariant Labs, April 2025. https://invariantlabs.ai/blog/mcp-security-notification-tool-poisoning-attacks
  7. [7]Zhan, Q. et al. (UIUC). InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated LLM Agents. Findings of ACL 2024, 2024. https://arxiv.org/abs/2403.02691
  8. [8]Debenedetti, E. et al. (ETH Zurich / Invariant Labs). AgentDojo: A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents. NeurIPS 2024 Datasets and Benchmarks Track, 2024. https://arxiv.org/abs/2406.13352
  9. [9]Zhan, Q., Fang, R., Panchal, H., Kang, D. (UIUC). Adaptive Attacks Bypass Defenses Against Indirect Prompt Injection. Findings of NAACL 2025, 2025. https://arxiv.org/abs/2503.00061
  10. [10]Carlini, N., Tramèr, F., Nasr, M., Hayes, J., Shumailov, I.. The Attacker Moves Second: Adaptive Attacks Bypass LLM Defenses. Google DeepMind / OpenAI / Anthropic, October 2025. https://arxiv.org/abs/2510.09023
  11. [11]OWASP Gen AI Security Project. OWASP Top 10 for Agentic Applications for 2026. OWASP, December 2025. https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/
  12. [12]JFrog. CVE-2025-6514: Critical RCE in the mcp-remote Client. JFrog Security Research, July 2025. https://jfrog.com/blog/2025-6514-critical-mcp-remote-rce-vulnerability/
  13. [13]Cymulate. EscapeRoute: CVE-2025-53109 and CVE-2025-53110 in Anthropic's Filesystem MCP Server. Cymulate Research, 2025. https://cymulate.com/blog/cve-2025-53109-53110-escaperoute-anthropic/
  14. [14]Simon Willison. Model Context Protocol has prompt injection security problems. simonwillison.net, April 2025. https://simonwillison.net/2025/Apr/9/mcp-prompt-injection/
Running agents on real data? Have us try to break them first.
Raptoric red-teams tool-using AI across every layer in this report: the model, its tools, the MCP servers behind them, and the autonomy that ties them together. Book a scoping call to see where your agents break.
Book a scoping call