MCP Tool Poisoning: Securing AI Agents from Malicious Servers

An MCP server provides the tool description and input schema. The LLM uses them to decide how and when to call the tool. A server that can change a description after approval can change agent behavior without changing application code.

This post focuses on the approval flow for MCP tool descriptions and the threats it addresses.

System description

The MCP client connects to a server and presents each returned tool's full description and input schema to the user for per-tool approval. The client hashes the approved description and schema against a stable server identity. On every subsequent connection it recomputes the hashes; any change blocks the tool until the user re-approves.

Architecture choice

There are two reasonable places to run the approval gate.

Client-side approval flow

Each MCP client (e.g. IDE or desktop chat app) tracks its own approvals and hashes for the servers it connects to. State lives on disk next to the client config.

Use this when:

Servers run as local stdio subprocesses for one developer
The agent is a single-user assistant on a workstation
You want changes to propagate at the speed the user upgrades clients

Trade-off: every client maintains its own approval state, so a server change forces every user to re-approve independently. Drift between clients is normal, and a stale client may keep approving a description newer clients have already flagged.

Gateway-mediated approval flow

A central MCP gateway or proxy sits between agents and servers. It hashes descriptions once per server and enforces the gate on behalf of every downstream client.

Use this when:

Multiple production agents share the same MCP servers
A platform team owns approvals and wants one place to revoke a tool
You need a consistent audit trail across many callers

Main risks: the gateway becomes a high-value target, and a stale cache can mask a real description change.

Common middle ground: gateway-mediated approvals for production agents, client-side approval for local development.

Golden path

Build this first. Then relax constraints only if you have a specific reason:

Connect to server → receive tool list → display full description and schema → approve per tool → store approval hash by server identity → re-check on every connection → block on mismatch → log tool calls with active description hash

Related patterns:

If you need a broader threat model for autonomous agents beyond MCP itself, see The AI Agent Attack Surface
If you are running shared production agents, put the approval gate behind AI Agent Gateway so one mediation layer enforces it for every caller
If retrieved documents reach the LLM through MCP tools, pair this with RAG Access Control Threat Model so retrieved content respects source permissions before it lands in the prompt

Core design

Approval gate (client)

The gate sits between tools/list and any tool call. For every tool the server returns, it presents the user with the full description text and input schema exactly as the LLM will receive them, with no truncation or summarization. It then records an explicit per-tool approval as (server_id, tool_name, approval_hash, approved_at, approved_by).

Approval store (storage)

Hashes are SHA-256 over a JSON document with four fields: server_id, tool_name, description, and input_schema.

Serialize this document using RFC 8785 JSON Canonicalization Scheme (JCS) so the same content produces the same hash across runtimes. Avoid hand-rolled canonicalization; differences in Unicode escaping or float serialization can create false mismatches.

Store the description exactly as received from the server. Do not trim or reformat it.

Key the approval store by server_id. The identity rules are defined below.

Static descriptions only

Tool descriptions must be static across connections. If a server includes dynamic state (current table names, environment IDs, row counts), the hash will change on every refresh and force repeated re-approval.

Move dynamic state into a separate list_resources or get_context tool. Keep the description constant.

Reconnection check

On every connection, the client recomputes hashes for the returned tool list and compares against the store. There are three outcomes:

Match: the tool is enabled and callable
Mismatch: the tool is disabled until the user re-approves; the diff is shown
New tool not in the store: treated as a first-time approval

A tool that appears mid-session is treated the same as a first-time tool. MCP servers can push a notifications/tools/list_changed event at any time, including mid-loop. The client must suspend the autonomous agent until the new toolset clears the gate.

Server identity binding

The approval is bound to a server identity, not just a tool name. For stdio servers, that identity is the package name and pinned version (@org/[email protected]). For HTTP servers, it is the origin paired with an authenticated server identity (OAuth client identity per the recent MCP spec revision) and a server-reported version. Switching either field invalidates prior approvals.

Tool call log

Every tool dispatch records server_id, tool_name, approval_hash at call time, the arguments, and a result summary.

Threat model

Baseline assumptions

The MCP server is untrusted: it controls tool names, descriptions, schemas, and return values. A gateway centralizes approval and enforcement; it does not make the server trusted
The MCP client and its approval store are trusted. The user or operator is trusted to approve tools, but the UI must make meaningful review possible (full text rendered, diff on change, no truncation by default)
The LLM treats the full description text as part of its reasoning context and may follow instructions embedded in it
Approval is per-tool, not per-server. A user approving one tool does not approve every tool the server lists now or later
Output-channel prompt injection (e.g. malicious tool return values or error messages) is a separate threat addressed by an output-handling pattern, not by this approval flow
Standard infra controls such as TLS and secret management are assumed to be in place

A note on risk

This table is not a checklist. Focus on preventing the highest-impact failures first. Detection and response are acceptable where prevention is impractical.

Phase 1: First-time approval

Focus: Catching a malicious description before the tool is callable

Asset	Threat	Baseline Controls	Mitigation Options	Risk
Agent behavior	Tool poisoning: Server embeds hidden instructions in the description text that steer the agent toward exfiltration or unauthorized side effects, while the user-visible summary stays benign	UI shows tool name and short summary	1. Render the full description and schema at approval time, byte-identical to what the LLM receives 2. Per-tool approval, not blanket server approval 3. Static checks on description text for instruction-shaped patterns flagged for the user before approval	High
Cross-server context	Cross-server shadowing: A second MCP server's description includes instructions that redirect the agent's use of an already-trusted server's tools, without the malicious tool ever being called	Per-server approval	1. Review descriptions from all connected servers as a combined set during approval, not one server at a time 2. Cap the number of simultaneously connected servers and re-approve when the set changes 3. Run untrusted or unfamiliar servers in a separate agent instance with an isolated LLM context	High
Tool schema	Schema manipulation: Server defines a catch-all input schema (untyped string blob or generic object) that lets the LLM dump its working context, including data retrieved from other trusted tools, into the attacker's server	Schema visible at approval	1. Constrain argument types to the minimum required for the documented use case; reject untyped string blobs at approval review 2. Hash the schema alongside the description so changes trigger re-approval 3. Sample tool call arguments in the log for unexpected payload sizes or secret-shaped content	Medium
Agent context budget	Description bloat: Server returns descriptions or schemas large enough to crowd out useful context or spike per-call token cost on every `tools/list`	UI rendering of full text	1. Enforce a hard byte and token cap on each tool's description and schema before approval and before hashing; reject oversized payloads at the gate 2. Cap the aggregate `tools/list` payload per server and surface offenders to the operator 3. Alert when a server's description size grows by more than a small percentage between connections, even when the user accepts the diff	Medium

Phase 2: Subsequent connections and lifecycle

Focus: Ensuring an approved tool stays the tool that was approved

Asset	Threat	Baseline Controls	Mitigation Options	Risk
Approved tool definition	Rug pull: Server ships a benign description at first launch, then mutates it on a later connection after the user has already approved	One-time approval	1. Recompute description and schema hashes on every connection and gate the tool on a match 2. Block the tool and require explicit re-approval on any mismatch, with a diff of old and new text 3. Pin server identity to a package version or version-tagged origin so a real change reads as a new identity	High
Approval store	Identity collision: Server changes its identifying metadata (name or origin or version) in a way that lands on a different `server_id` and silently bypasses the stored hash because no prior record exists for the new identity	Server identity stored as part of approval	1. Treat any new `server_id` as first-time approval and refuse to inherit approvals across identities 2. On mismatch between server-reported version and a previously seen origin, surface the change instead of auto-promoting it 3. Periodic operator review of the approval store for orphaned or duplicate server entries	Medium
Tool call log	Silent invocation: A tool runs without a record that ties the call to the approval hash that was active at the time, so a later rug pull can't be reconstructed	Generic request logs	1. Log `server_id`, `tool_name`, `approval_hash`, arguments, and a result summary on every dispatch 2. Retain logs long enough to span typical disclosure windows (months, not days) 3. Alert on an approval-hash change between connections even when the user re-approves the new value	Low
Agent execution loop	Mid-loop toolset swap: A `notifications/tools/list_changed` event arrives while the agent is mid-loop and the loop continues against the unverified, possibly-changed toolset	Gate enforced on initial connect	1. Suspend the agent loop on `notifications/tools/list_changed` until re-validation completes 2. Version the approved toolset and require version match before every dispatch 3. Fail closed if gate state is unknown or pending	Medium

If you run gateway-mediated approval

If a central gateway holds the approval state for many clients, the threat profile shifts:

The gateway's hash recomputation cadence becomes the effective re-check frequency for every downstream client
A compromise of the gateway compromises every downstream agent's approval gate, so the gateway's own integrity becomes the high-value asset
Per-tool approval can still be enforced, but the user prompt now lives in an operator console rather than the end user's IDE; reviewers must understand that they are approving on behalf of the agent's principal
Cross-server shadowing review happens once at the gateway instead of in each client

FAQs

Do I need this if all my MCP servers are internal?

Internal servers narrow the threat to insider risk and supply-chain compromise of the packages or images you build the servers from, but they do not eliminate it. A description shipped by a teammate's PR or pulled from a private registry is still a description the LLM will follow.

How is this different from generic supply-chain review?

Generic dependency review checks code that runs on your hosts. The approval flow checks a different artifact: the text and schema the LLM will reason over. Two MCP server versions can have identical code review approval and ship different tool descriptions, and only the description hash will catch that.

Verification checklist

Approval gate
- Open the approval prompt; confirm the full description and schema are visible without truncation or scroll-to-reveal
- Approve one tool from a multi-tool server; no other tool from that server is callable
- Have the server push notifications/tools/list_changed mid-session; the new tool is not callable until it clears the gate
Hashing and storage
- The hash is SHA-256 over an RFC 8785 (JCS) serialization of an approval document holding server_id, tool_name, description, and input_schema; the description field is preserved byte-for-byte so whitespace changes always change the hash
- Re-version a previously-approved server; previous approvals do not auto-apply to the new server_id
- Restart and upgrade the client; the approval store survives both
Reconnection check
- Disconnect and reconnect a previously-approved server; tool calls fail until hash verification completes
- On a hash mismatch, the affected tool stays disabled and the user sees a diff rather than a yes/no prompt
- Connect a server with a renamed identity; no prior approval is inherited and the gate fires as first-time
- Simulate approval store or hash failure; tool execution fails closed and no tool is callable
Multi-server review
- When more than one server is connected, the approval UI shows descriptions from both in a single review step before enabling tools
- The number of simultaneously connected servers is bounded and reviewed; untrusted servers run in a separate agent instance
Audit trail
- Every tool call logs server_id, tool_name, approval_hash active at call time, arguments, and a result summary
- A approval-hash change between connections is logged and alerted even when the user accepts the re-approval prompt
- Logs are retained long enough to investigate a description rotation discovered weeks later
Attacker tests
- Description swap: change the description text on a server and reconnect; the tool should stay disabled until re-approval
- Schema swap: widen an input schema (add a free-form context field) and reconnect; the same gate should fire
- Identity swap: rename or re-version the server and confirm previously approved tools are not auto-trusted under the new server_id
- Mid-session tool addition: have the server return an extra tool on a refreshed tools/list and confirm it is gated, not silently added
- Shadowing rehearsal: connect a second server whose description references another server's tool by name and verify the combined-review prompt surfaces it

Implementation & Review

The full threat model matrix, architectural diagrams, and a printable verification checklist for this pattern are available in the Secure Patterns repository. Use these artifacts to guide your design reviews and internal audits.

MCP Tool Poisoning: A Safe Approval Flow for Tool Descriptions