AI agents run with real capabilities: shell access, API keys, database credentials, tool integrations. Users interact with these agents through client apps (Slack bots, web UIs, custom CLIs). Without a mediation layer, every client has a direct channel to a privileged workload, and every agent response is an unfiltered pipe back to the user. The core risk: the agent has more authority than the caller, and the caller can trick it into using that authority.
This post documents a safe default architecture for mediating all agent communication through a gateway, and its trade-offs.
System description
An AI Agent Gateway sits between client applications and AI agents, mediating every request and response. The gateway enforces authorization, inspects payloads in both directions, and logs every interaction. Agents are workloads. The gateway is the control plane. Client apps are untrusted callers.

Architecture choice
There are three common deployment models for mediating agent traffic. The security trade-offs differ for each.
Centralized gateway
A single gateway cluster handles all agent traffic. Every client connects to the gateway; every agent is reachable only through it.
Use this when:
You have a manageable number of agents (tens, not thousands)
You want a single policy enforcement point and audit log
Your team can operate a stateful proxy tier
Main risks: Single point of failure. The gateway is on the critical path for all agent interactions, so an outage blocks all agent access. Scaling the gateway independently of agents requires capacity planning.
Sidecar proxy
Each agent gets a co-located proxy (container sidecar, local process) that enforces policy locally. Clients still connect through a thin ingress layer, but authZ and data inspection happen at the edge of each agent.
Use this when:
Agents are distributed across many environments (developer laptops, on-prem VMs, multiple cloud regions)
You need per-agent policy enforcement without routing all traffic through a central bottleneck
Agent count is high or dynamic (auto-scaling agent pools)
Trade-off: Policy distribution becomes the hard problem. Every sidecar needs a current policy bundle, and stale policies mean stale authZ. You also lose centralized request-level audit unless sidecars ship logs to a collector.
Service mesh
Agents and the gateway are part of a mesh (Istio, Linkerd). mTLS between all participants is automatic. Policy is declared centrally and enforced by mesh-managed proxies.
Use this when:
You already operate a service mesh
You need mTLS everywhere without managing certificates per agent
You want to layer agent-specific policy on top of existing mesh authZ primitives
Trade-off: Mesh infrastructure is operationally heavy. If you don't already have one, adopting a mesh solely for agent governance is overkill.
Common middle ground: Start with a centralized gateway for the first deployment. Add sidecar proxies only for agents that can't route through the central gateway (e.g., on-prem or developer-local agents). Use a mesh only if you already have one.
Golden path
Build this first. Then relax constraints only if you have a specific reason:
Client authenticates → TLS/mTLS to gateway → gateway evaluates authZ policy → inbound data inspection → forward to agent → agent executes and responds → outbound data inspection → deliver response to clientEach step is a gate. A failure at any gate stops the flow and returns an error to the client.
Core design
Identity and authentication
Every client app must authenticate to the gateway. The gateway does not trust the client's claim of user identity; it verifies it. Never derive user identity from client-supplied fields user_id in the request body) or forwarded headers X-User-ID, X-Forwarded-User). Extract identity exclusively from verified credentials (OAuth token, client certificate, platform signature).
Web UIs and custom apps: OAuth 2.0 / OIDC tokens validated at the gateway. The gateway extracts
user_id,tenant_id, and roles from the verified tokenSlack and chat integrations: The gateway validates the platform's request signature (e.g., Slack's
X-Slack-Signature), then maps the platform user ID to an internal principalmTLS clients: Client certificate CN or SAN provides the identity. The gateway maps the certificate identity to a principal in its policy store
Agents also authenticate to the gateway. Each agent has a unique identity (client certificate, API key, or service account token) that the gateway uses to verify the agent is who it claims to be. Agent credentials are never exposed to clients.
The gateway evaluates policy on every request. Policy answers two questions:
Can this user talk to this agent? A binding between user identity or role and agent identity. Users in the
engineeringgroup can accesscode-review-agent; users infinancecan accessreporting-agent.Can this user perform this action? Finer-grained control over what the user can ask the agent to do. A
read-onlyrole can query the agent but cannot trigger tool executions that modify state.
Store policy centrally and version it. Changes take effect on the next request (no cache that outlives a policy update). Deny by default: if no policy matches, the request is rejected. If the policy store is unreachable, the gateway must fail closed (reject all requests), not fail open.
Data inspection
The gateway inspects payloads flowing in both directions.
Inbound inspection (client → agent):
Reject requests containing patterns that indicate prompt injection targeting the agent's tool-use capabilities (e.g., instructions to ignore system prompts, requests to execute commands not permitted by the user's role)
Enforce payload size limits to prevent resource exhaustion on the agent
Outbound inspection (agent → client):
Scan responses for sensitive data patterns: API keys, credentials, PII, internal hostnames, file paths, database connection strings
Redact or block responses that match DLP rules before they reach the client
Flag responses where the agent attempted to return data outside the scope of the user's query
For streaming responses (SSE, WebSocket), buffer the complete response and scan it before delivery. If latency constraints require chunked delivery, accept the partial-leak risk
If the DLP engine is unavailable, the gateway should block responses rather than delivering uninspected content. Accept the availability hit.
Data inspection is not a silver bullet. Pattern-based DLP has false positives and false negatives. The goal is to catch accidental leakage and obvious exfiltration, not to prevent a determined adversary who controls the agent runtime.
Audit logging
The gateway logs every request and response:
timestamp,request_id,user_id,tenant_id,agent_id,actionAuthZ decision (
allow/deny+ policy rule that matched)DLP scan result (
pass/redacted/blocked+ rule that triggered)Truncated request / response payloads (configurable: omit for sensitive workloads, include for compliance-heavy ones)
Scrub tool-call arguments from logs when they may contain credentials or secrets (e.g., API keys passed as tool parameters)
Logs are append-only. Ship them to a centralized log store. They are the primary artifact for incident investigation and compliance audits.
Agent-to-agent isolation
Agents must not communicate with each other directly. If agent A needs to invoke agent B, that request goes through the gateway as a new request with agent A's identity, subject to the same authZ and data inspection as any client request. Enforce isolation with network controls: agents accept inbound connections only from the gateway (security group, network policy, firewall rule). Agents should not be routable from client networks or from each other.
Threat model
Baseline assumptions
Clients are untrusted: they can craft arbitrary payloads, replay requests, and impersonate other users if authentication is weak
Agents are semi-trusted: they run your code and have real capabilities, but their outputs are not trusted for data safety (LLM outputs are non-deterministic and can contain leaked context)
Compromise of the gateway is out of scope for this model. The gateway concentrates AuthN, AuthZ, DLP, credential injection, and audit; restrict admin access, version policy changes, and monitor for tampering as you would any Tier-0 control plane
Network between gateway and agents is private (VPC, WireGuard, or mTLS), not the open internet
Agents do not share conversational state across tenants. If multiple tenants use the same agent pool, the gateway (or the agent runtime) enforces per-tenant context isolation. Without this, an agent that retains state across requests can return one tenant's context in another tenant's response
Standard infra controls such as TLS termination, WAF, database AuthN, and OS-level hardening are assumed to be in place. This model focuses on the agent communication pattern
A note on risk: you won’t fix everything
This table isn’t a checklist where every row must be fully eliminated. Focus on preventing the worst failures and limiting blast radius. In practice: ship prevention for the High rows first, then add monitoring and response for what you can’t realistically prevent.
Phase 1: Client to gateway
Focus: Preventing unauthorized access to agents and injection of malicious payloads
Asset | Threat | Baseline Controls | Mitigation Options | Risk |
|---|---|---|---|---|
Agent access | AuthZ bypass: Attacker crafts request to reach an agent they're not permitted to use (e.g., manipulating routing headers or agent IDs) | AuthZ policy evaluation on every request | 1. Deny default: Reject if no explicit policy match 2. Routing lockdown: Clients address agents by logical name only; the gateway resolves to internal endpoints and rejects requests containing direct agent addresses | High |
User identity | Session hijack: Attacker steals OAuth token or session cookie to impersonate a legitimate user | Token validation at gateway | 1. Short-lived tokens: Access tokens expire in minutes, not hours 2. Binding: Bind sessions to client fingerprint (IP, TLS session) where feasible 3. Revocation: Support immediate token revocation via introspection or short cache TTL | Medium |
Agent runtime | Prompt injection via client: User crafts input designed to override the agent's system prompt, causing it to execute unintended tool calls | Input validation at gateway | 1. Input scanning: Pattern-match known injection templates (e.g., "ignore previous instructions") 2. Tool governance: See tool execution row in Phase 2 3. Agent hardening: Agents use structured tool-call interfaces, not free-form command parsing | High |
Gateway availability | Request flooding: Attacker sends high volume of requests to exhaust gateway resources | Rate limiting | 1. Per-user rate limits: Throttle by authenticated identity 2. Per-agent limits: Protect individual agents from traffic spikes 3. Backpressure: Gateway returns 429 and sheds load before agents are affected | Medium |
Phase 2: Gateway to agent
Focus: Securing the communication channel and preventing credential leakage
Asset | Threat | Baseline Controls | Mitigation Options | Risk |
|---|---|---|---|---|
Agent credentials | Credential leak: Agent API keys or service account tokens are exposed in logs, error messages, or configuration | Credentials stored in secrets manager | 1. Injection: Gateway injects agent credentials at request time; clients never see them 2. Rotation: Automate credential rotation; detect stale credentials 3. Redaction: Scrub credentials from all log outputs | High |
Tool execution | Unauthorized tool call: User or injected prompt triggers a tool invocation (shell, SQL, HTTP) beyond the user's permitted scope, causing data modification or lateral movement | AuthZ policy evaluation | 1. Tool allow-list: Gateway maintains per-role allow-lists of permitted tool types and blocks unlisted calls before they reach the agent 2. Structured intents: Agents emit structured tool-call intents (tool name, arguments as typed fields); gateway parses and validates before execution 3. Parameter validation: Gateway enforces argument constraints (e.g., allowed hostnames for HTTP fetch, read-only for SQL) | High |
Agent identity | Agent spoofing: Rogue process registers as a legitimate agent and receives user requests | Agent authentication required | 1. mTLS: Require client certificates for agent-to-gateway connections 2. Registration: Agents must be registered in the gateway's agent inventory before receiving traffic 3. Health checks: Gateway periodically verifies agent identity and liveness | Medium |
Request integrity | Tampering: Man-in-the-middle modifies request payloads between gateway and agent | Private network / VPC | 1. mTLS: Encrypt and authenticate all gateway-to-agent traffic 2. Signing: Gateway signs forwarded requests; agent verifies signature before processing | Low |
Phase 3: Agent response to client (via the gateway)
Focus: Preventing data exfiltration and ensuring response integrity
Asset | Threat | Baseline Controls | Mitigation Options | Risk |
|---|---|---|---|---|
Sensitive data | Data exfil via response: Agent returns API keys, database credentials, PII, or internal infrastructure details in its natural-language response | Outbound DLP scan | 1. Pattern matching: Scan for known secret formats (AWS keys, connection strings, SSNs) 2. Redaction: Replace matched patterns with 3. Alerting: Notify security team when DLP rules trigger repeatedly for the same agent | High |
Response scope | Over-fetching: Agent retrieves and returns data beyond what the user's role permits (e.g., agent has broad DB access but user should only see their own records) | AuthZ at gateway | 1. Scoped context: Gateway passes the user's permission scope to the agent as part of the request context 2. Structured responses: Where agents return structured data (JSON, SQL results), validate response fields against the user's authorized scope. For free-form text responses, this control does not apply; rely on scoped agent credentials instead 3. Least privilege agents: Run agents with minimal credentials scoped to the task | Medium |
Audit trail | Silent exfil: Data leaves through agent tool calls (HTTP requests, file writes) rather than through the response path | Agent in private network | 1. Egress control: Restrict agent outbound network access to an allowlist 2. Tool-call logging: Log all tool invocations and their arguments 3. Alerting: Alert on unexpected egress destinations or high-volume data tool calls | High |
Client trust | Response injection: Agent response contains executable content (scripts, links) that the client renders unsafely | Client-side rendering controls | 1. Content typing: Gateway sets response content type to plain text / structured JSON 2. Sanitization: Strip HTML / script tags from natural-language responses 3. Client hardening: Client apps treat agent responses as untrusted content (no eval, no innerHTML) | Low |
If you use a sidecar or mesh instead
If you deploy sidecars instead of a centralized gateway, the threat profile shifts:
Policy staleness becomes a primary risk: Each sidecar enforces a local copy of the policy. If policy distribution lags, users may retain access after revocation or lose access prematurely
Audit aggregation is harder: Logs are distributed across sidecars. You need a reliable log shipping pipeline; gaps mean blind spots in incident response
Agent-to-agent isolation requires mesh policy: Without a central chokepoint, you rely on mesh-level network policy to prevent direct agent-to-agent communication. Misconfigured mesh policy can silently allow lateral movement
DLP consistency is harder to maintain: DLP rules must be distributed to every sidecar. Version drift between sidecars means inconsistent data inspection
Sidecar failure mode matters: If a sidecar loses contact with the policy distributor, it must fail closed (block all requests). A sidecar that fails open exposes the agent to unauthenticated traffic until policy is restored
The sidecar model is not less secure. It trades centralized simplicity for distributed resilience, but the operational cost of keeping distributed policy and DLP in sync is real.
Verification checklist
Authentication
Unauthenticated requests to the gateway return 401 before reaching any agent
Slack / chat integration requests are rejected if platform signature validation fails
mTLS clients with expired or unknown certificates are rejected at the TLS handshake
Authorization
A user with no explicit policy binding receives 403 when addressing any agent
Removing a user's role binding immediately prevents access on the next request (no stale cache)
Agent logical names are resolved by the gateway; direct agent endpoint addresses in client requests are rejected
Data inspection (inbound)
Oversized payloads return 413 before forwarding
Known prompt-injection patterns in request payloads trigger a block or flag (verified with test payloads)
Data inspection (outbound)
Agent responses containing test secret patterns (e.g.,
AKIA...AWS key format) are redacted before delivery to the clientDLP rule triggers are logged with the
request_id,agent_id, and matched rule
Agent isolation
Agent A cannot send a request to Agent B without that request passing through the gateway's authZ and DLP pipeline
Only agents registered in the gateway's agent inventory receive traffic; unregistered agents are rejected
Agents cannot reach the gateway's admin API or policy store
Credential management
Agent credentials (API keys, service account tokens) never appear in gateway logs or client-visible error messages
Credential rotation requires zero gateway downtime
Audit and detection
Every authZ decision (allow and deny) is logged with
user_id,agent_id,action, and policy ruleDLP scan results (pass, redact, block) are logged for every response
Alerts fire when a single user triggers more than N DLP blocks within a time window
Implementation & Review
The full threat model matrix, architectural diagrams, and a printable verification checklist for this pattern are available in the Secure Patterns repository. Use these artifacts to guide your design reviews and internal audits.
