AI SecurityJune 13, 2026 · 6 min read

Securing AI agents: the new attack surface of agentic AI

When AI can take actions, a manipulated model becomes a manipulated actor. See why agentic AI is a new attack surface and how to secure your agents.

An AI security engineer reviewing an agent's tool calls and action logs on a monitoring dashboard.

Written by

Alen Bosanac

Offensive Security

AI agent security is the practice of protecting AI systems that can take actions, not just generate text. An agent is a model given tools, autonomy, and a goal: it can query databases, call APIs, send emails, run code, browse the web, or trigger workflows on its own. That capability is exactly what makes agentic AI useful, and exactly what makes it dangerous. The moment a model can act, a manipulated model becomes a manipulated actor, and a successful prompt injection stops being an information problem and becomes an action problem.

This is the fastest-growing area of AI security, because organizations are moving from chatbots that answer questions to agents that do work. The security implications are not well understood, and the controls that contain a simple chatbot are not enough for a system that can move money or change records. This article explains why agents are a distinct attack surface, how they get exploited, and how to secure them, drawing on the work we do through our AI security service.

What is an AI agent?

An AI agent is a model wrapped in a loop that lets it plan and act toward a goal, using tools to affect the world beyond text. Where a traditional chatbot takes input and returns output, an agent decides what to do next, calls a tool, observes the result, and continues until it reaches its objective. Tools might include a database query, an email send, a code execution environment, a payment API, or another agent. Multi-agent systems, where several agents coordinate, add further complexity.

This autonomy is what delivers value: an agent can resolve a support ticket end to end, reconcile records, or carry out a multi-step task without a human in the loop. But every tool an agent can use is also an action an attacker may be able to trigger if they can influence the agent's behavior.

Why agentic AI is a new attack surface

Agents combine two things that are dangerous together: susceptibility to manipulation through language, and the ability to take real-world actions. Several properties make them especially hard to secure.

A manipulated agent acts, so the consequence of a successful attack is not a bad answer but an unwanted action with real effects.
Agents process untrusted data, such as emails, documents, and web pages, any of which can carry hidden instructions through indirect prompt injection.
Autonomy reduces human oversight, so an attack can complete before anyone reviews what the agent did.
Tool permissions are often too broad, granted for convenience and never scoped down, so a compromised agent can reach far more than its task requires.
Multi-agent and chained systems expand the blast radius, because one manipulated agent can influence others.

A chatbot that is tricked says the wrong thing. An agent that is tricked does the wrong thing. The difference between those two is the entire problem of agent security.

How AI agents get exploited

Most agent attacks start with influencing the agent's input and end with abusing the actions it can take. The common patterns include the following.

Indirect prompt injection, where an attacker plants instructions in data the agent processes, such as a document, email, or webpage, and the agent follows them.
Tool abuse, where a manipulated agent is steered into using its tools for the attacker's ends, such as exfiltrating data or making unauthorized changes.
Confused-deputy attacks, where the agent uses its own legitimate permissions on behalf of an attacker who could not act directly.
Excessive agency, where the agent is allowed to take consequential actions without confirmation, so a single manipulation causes real damage.
Memory and context poisoning, where an attacker corrupts the information the agent relies on across steps or sessions.

How to secure AI agents

Because agent attacks end in actions, the most important controls constrain what the agent can do, not just what it can say. The goal is that being manipulated is not enough to cause harm.

Scope tool permissions tightly, so each agent has only the access its task genuinely requires and nothing more.
Require confirmation for consequential actions, so high-impact steps such as payments or deletions are not fully autonomous.
Treat all retrieved and tool-returned content as untrusted, and design so that instructions found in data are never executed as commands.
Put hard boundaries outside the model, enforced in code the attacker cannot talk to, rather than relying on the prompt to hold.
Validate agent actions and outputs before downstream systems trust them.
Log every action with full context, so an attack can be detected, investigated, and reconstructed.

These controls are design decisions, not settings, which is why agent security has to be built into the architecture. We design and validate them as part of an engagement, and test whether they hold through AI red teaming and AI penetration testing.

The confused deputy problem

A useful way to think about agent security is the confused deputy: a program that uses its own legitimate authority on behalf of someone who should not have it. An agent with access to your database and your email is a powerful deputy. If an attacker can influence it through a poisoned document, the agent may use its real permissions to do the attacker's bidding, and from the system's point of view the actions look authorized. The defense is to assume the agent will eventually be confused and design so that confusion alone cannot cause harm, by constraining permissions and requiring confirmation for anything consequential.

Agents, detection, and response

Prevention is necessary but not sufficient, because no guardrail is perfect and agents act quickly. The ability to detect anomalous agent behavior, an unusual sequence of tool calls, access to data outside the task, actions at odd times, is part of the defense. Complete logging makes this possible, and is also what lets you investigate and reconstruct an incident afterward. We cover detection and response for production systems through our detection and response service.

Agents and regulation

Agentic systems that take consequential actions are exactly the kind of high-risk AI the EU AI Act is concerned with, requiring robustness, human oversight, and security appropriate to the risk. The NIST AI Risk Management Framework, explained in our NIST AI RMF guide, similarly emphasizes oversight and measurement. Designing agents with constrained permissions, human confirmation, and full logging is both good security and evidence of the oversight regulators expect, as covered on our EU AI Act page.

Agentic AI is where the value and the danger of AI meet, because the same autonomy that makes agents useful makes them powerful in an attacker's hands. If you are deploying agents or giving models access to tools and data, see our AI security service and book a scoping call to discuss securing them.

Frequently asked questions

What is AI agent security?

AI agent security is the protection of AI systems that can take actions through tools, such as querying databases, sending emails, or executing code. Because a manipulated agent can act, securing it focuses on constraining what it can do, not only what it can say.

Why are AI agents more dangerous than chatbots?

A chatbot that is manipulated produces a bad answer. An agent that is manipulated takes a real action, such as moving data or making a change, with real consequences. The ability to act turns a content problem into an action problem.

How do you secure an AI agent?

Scope tool permissions tightly, require confirmation for consequential actions, treat retrieved content as untrusted, enforce hard boundaries outside the model, validate outputs, and log every action. The aim is that manipulating the agent is not enough to cause harm.

Can prompt injection affect AI agents?

Yes, and it is the central risk. Indirect prompt injection, where instructions are hidden in data the agent processes, can steer the agent into abusing its tools. Because the agent can act, the impact is greater than with a text-only chatbot.

Sources

1OWASP. OWASP Top 10 for LLM Applications. Open Worldwide Application Security Project, 2025. Link
2NIST. AI Risk Management Framework (AI RMF 1.0). National Institute of Standards and Technology, 2023. Link
3ENISA. Multilayer Framework for Good Cybersecurity Practices for AI. European Union Agency for Cybersecurity, 2023. Link

Related service

AI Security

→

Want this tested on your own systems?

Our team will scope it with you on a 30-minute call.

Book a scoping call

Keep reading

All insights →

01AI Security

AI penetration testing: how to test LLM apps, agents, and RAG

Read →8 min read

02AI Security

AI red teaming: a practical guide for security teams

Read →7 min read

03AI Security

MCP security: risks of the Model Context Protocol and how to manage them

Read →5 min read