An MCP server provides the tool description and input schema. The LLM uses them to decide how and when to call the tool. A server that can change a description after approval can change agent behavior without changing application code.
This post focuses on the approval flow for MCP tool descriptions and the threats it addresses.
System description
The MCP client connects to a server and presents each returned tool's full description and input schema to the user for per-tool approval. The client hashes the approved description and schema against a stable server identity. On every subsequent connection it recomputes the hashes; any change blocks the tool until the user re-approves.

Architecture choice
There are two reasonable places to run the approval gate.
Client-side approval flow
Each MCP client (e.g. IDE or desktop chat app) tracks its own approvals and hashes for the servers it connects to. State lives on disk next to the client config.
Use this when:
Servers run as local stdio subprocesses for one developer
The agent is a single-user assistant on a workstation
You want changes to propagate at the speed the user upgrades clients
Trade-off: every client maintains its own approval state, so a server change forces every user to re-approve independently. Drift between clients is normal, and a stale client may keep approving a description newer clients have already flagged.
Gateway-mediated approval flow
A central MCP gateway or proxy sits between agents and servers. It hashes descriptions once per server and enforces the gate on behalf of every downstream client.
Use this when:
Multiple production agents share the same MCP servers
A platform team owns approvals and wants one place to revoke a tool
You need a consistent audit trail across many callers
Main risks: the gateway becomes a high-value target, and a stale cache can mask a real description change.
Common middle ground: gateway-mediated approvals for production agents, client-side approval for local development.
Golden path
Build this first. Then relax constraints only if you have a specific reason:
Connect to server → receive tool list → display full description and schema → approve per tool → store approval hash by server identity → re-check on every connection → block on mismatch → log tool calls with active description hashRelated patterns:
If you need a broader threat model for autonomous agents beyond MCP itself, see The AI Agent Attack Surface
If you are running shared production agents, put the approval gate behind AI Agent Gateway so one mediation layer enforces it for every caller
If retrieved documents reach the LLM through MCP tools, pair this with RAG Access Control Threat Model so retrieved content respects source permissions before it lands in the prompt
Core design
Approval gate (client)
The gate sits between tools/list and any tool call. For every tool the server returns, it presents the user with the full description text and input schema exactly as the LLM will receive them, with no truncation or summarization. It then records an explicit per-tool approval as (server_id, tool_name, approval_hash, approved_at, approved_by).
Approval store (storage)
Hashes are SHA-256 over a JSON document with four fields: server_id, tool_name, description, and input_schema.
Serialize this document using RFC 8785 JSON Canonicalization Scheme (JCS) so the same content produces the same hash across runtimes. Avoid hand-rolled canonicalization; differences in Unicode escaping or float serialization can create false mismatches.
Store the description exactly as received from the server. Do not trim or reformat it.
Key the approval store by server_id. The identity rules are defined below.
Static descriptions only
Tool descriptions must be static across connections. If a server includes dynamic state (current table names, environment IDs, row counts), the hash will change on every refresh and force repeated re-approval.
Move dynamic state into a separate list_resources or get_context tool. Keep the description constant.
Reconnection check
On every connection, the client recomputes hashes for the returned tool list and compares against the store. There are three outcomes:
Match: the tool is enabled and callable
Mismatch: the tool is disabled until the user re-approves; the diff is shown
New tool not in the store: treated as a first-time approval
A tool that appears mid-session is treated the same as a first-time tool. MCP servers can push a notifications/tools/list_changed event at any time, including mid-loop. The client must suspend the autonomous agent until the new toolset clears the gate.
Server identity binding
The approval is bound to a server identity, not just a tool name. For stdio servers, that identity is the package name and pinned version (@org/[email protected]). For HTTP servers, it is the origin paired with an authenticated server identity (OAuth client identity per the recent MCP spec revision) and a server-reported version. Switching either field invalidates prior approvals.
Tool call log
Every tool dispatch records server_id, tool_name, approval_hash at call time, the arguments, and a result summary.
Threat model
Baseline assumptions
The MCP server is untrusted: it controls tool names, descriptions, schemas, and return values. A gateway centralizes approval and enforcement; it does not make the server trusted
The MCP client and its approval store are trusted. The user or operator is trusted to approve tools, but the UI must make meaningful review possible (full text rendered, diff on change, no truncation by default)
The LLM treats the full description text as part of its reasoning context and may follow instructions embedded in it
Approval is per-tool, not per-server. A user approving one tool does not approve every tool the server lists now or later
Output-channel prompt injection (e.g. malicious tool return values or error messages) is a separate threat addressed by an output-handling pattern, not by this approval flow
Standard infra controls such as TLS and secret management are assumed to be in place
A note on risk
This table is not a checklist. Focus on preventing the highest-impact failures first. Detection and response are acceptable where prevention is impractical.
Phase 1: First-time approval
Focus: Catching a malicious description before the tool is callable
Asset | Threat | Baseline Controls | Mitigation Options | Risk |
|---|---|---|---|---|
Agent behavior | Tool poisoning: Server embeds hidden instructions in the description text that steer the agent toward exfiltration or unauthorized side effects, while the user-visible summary stays benign | UI shows tool name and short summary | 1. Render the full description and schema at approval time, byte-identical to what the LLM receives 2. Per-tool approval, not blanket server approval 3. Static checks on description text for instruction-shaped patterns flagged for the user before approval | High |
Cross-server context | Cross-server shadowing: A second MCP server's description includes instructions that redirect the agent's use of an already-trusted server's tools, without the malicious tool ever being called | Per-server approval | 1. Review descriptions from all connected servers as a combined set during approval, not one server at a time 2. Cap the number of simultaneously connected servers and re-approve when the set changes 3. Run untrusted or unfamiliar servers in a separate agent instance with an isolated LLM context | High |
Tool schema | Schema manipulation: Server defines a catch-all input schema (untyped string blob or generic object) that lets the LLM dump its working context, including data retrieved from other trusted tools, into the attacker's server | Schema visible at approval | 1. Constrain argument types to the minimum required for the documented use case; reject untyped string blobs at approval review 2. Hash the schema alongside the description so changes trigger re-approval 3. Sample tool call arguments in the log for unexpected payload sizes or secret-shaped content | Medium |
Agent context budget | Description bloat: Server returns descriptions or schemas large enough to crowd out useful context or spike per-call token cost on every | UI rendering of full text | 1. Enforce a hard byte and token cap on each tool's description and schema before approval and before hashing; reject oversized payloads at the gate 2. Cap the aggregate 3. Alert when a server's description size grows by more than a small percentage between connections, even when the user accepts the diff | Medium |
Phase 2: Subsequent connections and lifecycle
Focus: Ensuring an approved tool stays the tool that was approved
Asset | Threat | Baseline Controls | Mitigation Options | Risk |
|---|---|---|---|---|
Approved tool definition | Rug pull: Server ships a benign description at first launch, then mutates it on a later connection after the user has already approved | One-time approval | 1. Recompute description and schema hashes on every connection and gate the tool on a match 2. Block the tool and require explicit re-approval on any mismatch, with a diff of old and new text 3. Pin server identity to a package version or version-tagged origin so a real change reads as a new identity | High |
Approval store | Identity collision: Server changes its identifying metadata (name or origin or version) in a way that lands on a different | Server identity stored as part of approval | 1. Treat any new 2. On mismatch between server-reported version and a previously seen origin, surface the change instead of auto-promoting it 3. Periodic operator review of the approval store for orphaned or duplicate server entries | Medium |
Tool call log | Silent invocation: A tool runs without a record that ties the call to the approval hash that was active at the time, so a later rug pull can't be reconstructed | Generic request logs | 1. Log 2. Retain logs long enough to span typical disclosure windows (months, not days) 3. Alert on an approval-hash change between connections even when the user re-approves the new value | Low |
Agent execution loop | Mid-loop toolset swap: A | Gate enforced on initial connect | 1. Suspend the agent loop on 2. Version the approved toolset and require version match before every dispatch 3. Fail closed if gate state is unknown or pending | Medium |
If you run gateway-mediated approval
If a central gateway holds the approval state for many clients, the threat profile shifts:
The gateway's hash recomputation cadence becomes the effective re-check frequency for every downstream client
A compromise of the gateway compromises every downstream agent's approval gate, so the gateway's own integrity becomes the high-value asset
Per-tool approval can still be enforced, but the user prompt now lives in an operator console rather than the end user's IDE; reviewers must understand that they are approving on behalf of the agent's principal
Cross-server shadowing review happens once at the gateway instead of in each client
FAQs
Do I need this if all my MCP servers are internal?
Internal servers narrow the threat to insider risk and supply-chain compromise of the packages or images you build the servers from, but they do not eliminate it. A description shipped by a teammate's PR or pulled from a private registry is still a description the LLM will follow.
How is this different from generic supply-chain review?
Generic dependency review checks code that runs on your hosts. The approval flow checks a different artifact: the text and schema the LLM will reason over. Two MCP server versions can have identical code review approval and ship different tool descriptions, and only the description hash will catch that.
Verification checklist
Approval gate
Open the approval prompt; confirm the full description and schema are visible without truncation or scroll-to-reveal
Approve one tool from a multi-tool server; no other tool from that server is callable
Have the server push
notifications/tools/list_changedmid-session; the new tool is not callable until it clears the gate
Hashing and storage
The hash is SHA-256 over an RFC 8785 (JCS) serialization of an approval document holding
server_id,tool_name,description, andinput_schema; thedescriptionfield is preserved byte-for-byte so whitespace changes always change the hashRe-version a previously-approved server; previous approvals do not auto-apply to the new
server_idRestart and upgrade the client; the approval store survives both
Reconnection check
Disconnect and reconnect a previously-approved server; tool calls fail until hash verification completes
On a hash mismatch, the affected tool stays disabled and the user sees a diff rather than a yes/no prompt
Connect a server with a renamed identity; no prior approval is inherited and the gate fires as first-time
Simulate approval store or hash failure; tool execution fails closed and no tool is callable
Multi-server review
When more than one server is connected, the approval UI shows descriptions from both in a single review step before enabling tools
The number of simultaneously connected servers is bounded and reviewed; untrusted servers run in a separate agent instance
Audit trail
Every tool call logs
server_id,tool_name,approval_hashactive at call time, arguments, and a result summaryA approval-hash change between connections is logged and alerted even when the user accepts the re-approval prompt
Logs are retained long enough to investigate a description rotation discovered weeks later
Attacker tests
Description swap: change the description text on a server and reconnect; the tool should stay disabled until re-approval
Schema swap: widen an input schema (add a free-form
contextfield) and reconnect; the same gate should fireIdentity swap: rename or re-version the server and confirm previously approved tools are not auto-trusted under the new
server_idMid-session tool addition: have the server return an extra tool on a refreshed
tools/listand confirm it is gated, not silently addedShadowing rehearsal: connect a second server whose description references another server's tool by name and verify the combined-review prompt surfaces it
Implementation & Review
The full threat model matrix, architectural diagrams, and a printable verification checklist for this pattern are available in the Secure Patterns repository. Use these artifacts to guide your design reviews and internal audits.
