The API Idempotency Threat Model: Safely Handling Retries

When an API call that changes state fails mid-flight (timeout, dropped connection), the client cannot tell whether the server processed it. The client retries. If the server did process the first request, the retry creates a duplicate: a double charge, a second order, an extra notification. Idempotency keys give each request a unique identifier so the server can recognize retries and return the original result instead of processing again. This post walks through the key lifecycle, the write-first pattern, and where things break.

System description

An API receives a state-changing request with an Idempotency-Key header, reserves the key in a durable store before doing any work, writes the business state and a pending job in a single transaction, then hands off external side effects to a background worker. On a valid retry for the same request, the server returns the cached result instead of reprocessing.

Architecture choice

The safety trade-offs depend on whether your key store can participate in the same transaction as your business logic.

Database-backed (PostgreSQL, MySQL): reference default

The key store lives in the same database as your application data. Reservation, business write, and outbox entry all go into a single transaction: if one fails, all fail.

Use this when:

You process payments or other operations where a partial failure means real money lost
You need the key reservation and the business write to be atomic
Your request volume fits within your database's write capacity

Trade-off: Requires connection pooling at scale, and write throughput is bounded by your database. A cache layer cannot give you that.

Cache-backed (Redis, DynamoDB): performance compromise

The key store is a fast, TTL-aware cache separate from your application database.

Use this when:

Your operations are naturally idempotent or low-stakes (e.g., toggling a flag, sending a non-critical notification)
You can tolerate a small window where a duplicate slips through
Built-in TTL management matters more than transactional consistency

Trade-off: Redis without persistence loses all keys on restart. Replication lag can cause two nodes to accept the same key independently. During failover, the promoted replica may be missing recent writes. In a multi-node setup, split-brain scenarios let both partitions accept requests. These are the same conditions that cause the retries you are trying to deduplicate.

Common middle ground: PostgreSQL for key reservation and result storage (same transaction as business logic), with a read-through Redis cache in front for fast lookups on retries.

Golden path

Build this first. Then relax constraints only if you have a specific reason:

Request arrives with Idempotency-Key header → reserve key → compare request fingerprint → write business state + pending job in one transaction → background worker calls external service → store completion result → return cached result on retry

Each step is a gate. If reservation fails because the key already exists, the server skips processing and returns the stored result.

Related patterns:

Webhook receivers need the same retry safety. See Secure Webhook Delivery: Signing, Verification, and SSRF Prevention for signature verification plus idempotent event processing.
Reset-token consumption has similar single-use semantics. See Password Reset Flows: The Secure Implementation Guide for token lifecycle and atomic consumption.
Upload finalization also benefits from write-first state transitions. See Pre-signed URLs: The Secure Implementation Guide.

Minimal system context

Client (caller): generates the idempotency key and sends it with every state-changing request
API handler (control plane): validates the key, reserves it, orchestrates processing
Key store (reservation): a durable table in the same database as application state. Holds the key, fingerprint, lock timestamp, and cached response
Outbox (delivery guarantee): rows written atomically alongside the business state. The async worker reads these
Async worker (side-effect executor): pulls from the outbox, calls external services, records completion
External service (data plane): payment processor, notification provider, etc. Anything outside your transaction boundary

Core design

Key format and scope

The IETF draft standard defines the Idempotency-Key header as a string value, commonly a UUID v4 or another high-entropy opaque string. The key is generated by the client and sent with every request that changes state.

The key must be scoped to the caller. In a multi-tenant system, the unique constraint in your database should be (tenant_id, idempotency_key), not just (idempotency_key). Without tenant scoping, two different customers who happen to generate the same UUID would collide: one gets the other's cached response.

Minimum schema for the key store:

idempotency_key: the client-provided value (max 255 chars)
tenant_id: scoping boundary (foreign key)
request_fingerprint: hash of the request body (for mismatch detection)
locked_at: timestamp (for concurrent request handling)
response_code: cached HTTP status
response_body: cached response (JSONB)
created_at: for TTL enforcement

The reservation (first-write lock)

The reservation is the most important step. Before the server does any work, it tries to insert the key into the store. If the insert succeeds, this is a new request and processing begins. If the insert fails because the key already exists, this is a retry and the server returns the cached result.

The insert must be a single atomic operation. The classic bug is a two-step check: first query whether the key exists, then insert if it doesn't. Two requests arriving milliseconds apart both pass the check, both insert, both process. The fix is a single statement that checks and inserts in one operation:

INSERT INTO idempotency_keys (tenant_id, idempotency_key, request_fingerprint, locked_at)
VALUES ($1, $2, $3, NOW())
ON CONFLICT (tenant_id, idempotency_key) DO NOTHING
RETURNING id;

If this returns a row, the key is new and the server holds the lock. If it returns nothing, the key already exists. One database round-trip, no race window.

For concurrent requests with the same key, a common implementation uses a locked_at timestamp. The first request acquires the lock. Concurrent requests see the lock and return 409 Conflict with a Retry-After header, telling the client to back off. Without Retry-After, poorly written clients hammer the endpoint in a tight loop while the lock is still held, turning normal retries into unnecessary load on the reservation path.

Request fingerprinting

When a retry arrives, the server needs to verify that the retry is actually the same request, not a different request reusing the same key by accident.

Store a hash of the request parameters (SHA-256) alongside the key. On retry, compute the hash again and compare. If they match, return the cached result. If they don't, reject with a 4xx error (commonly 422). AWS returns IdempotentParameterMismatch for the same case.

Fingerprints should be computed from a canonical representation of the business parameters, not the raw HTTP body bytes. Two logically identical retries can serialize JSON differently due to key ordering, whitespace, or library differences. Sort keys or normalize fields before hashing so equivalent retries do not fail due to serialization variance.

Without fingerprinting, a client that accidentally reuses a key with a different amount or different recipient gets back a stale response from the original request. In a payment system, that means charging $10 when the intent was $20, or sending money to the wrong account.

Partial failure and the write-first pattern

The hardest problem in idempotency is partial failure. Consider this sequence:

Server receives request with idempotency key
Server calls payment processor, card is charged
Server tries to write the idempotency result to the database
Database write fails (connection lost, disk full, timeout)

The charge happened, but the server has no record of it. The client retries, and the server processes it as a new request. Double charge.

The fix is to reverse the order: write first, call the external service second. Save the intent and the idempotency key in the same database transaction before calling the payment processor. Then publish the external call from a durable outbox table. A background worker reads the outbox and sends the request to the payment processor. If the worker fails, it retries from the outbox. The database transaction is the source of truth.

This is Brandur Leach's "atomic phases" pattern. The key insight: your database transaction succeeds or fails as a unit. External calls happen after the transaction commits, driven by an outbox that guarantees delivery. If the downstream provider supports idempotency, carry the same key or correlation ID into those calls. Your local transaction prevents duplicate intent creation; provider-side idempotency prevents duplicate external execution when the worker retries.

A related failure mode is partial coverage. The WooCommerce-Stripe plugin had a bug where idempotency keys were set for the /charges endpoint but not for /payment_intents. Subscription renewals created duplicate payment intents on retry because the second API call in the sequence had no idempotency protection. If a multi-step flow has idempotency on some endpoints but not others, the unprotected endpoints become the failure point.

What to store and for how long

Store the HTTP status code and response body alongside the key. On retry, return exactly what the original request returned.

Some implementations cache all responses, including 500 errors. The risk: if you cache a server error, every retry returns the cached error even after the underlying bug is fixed. A safer default is to only cache 2xx and 4xx responses. Let 5xx responses be retried fresh, since server errors are usually transient.

TTL depends on your use case. Common values range from 24 hours to 30 days. After the TTL expires, the key is eligible for cleanup and the same key value can create a new request. Pick a TTL that covers your longest realistic retry window.

Background workers

Three background processes keep the system healthy:

Stale-processing sweeper. Finds keys stuck in "processing" (where locked_at is older than a threshold, say 15 minutes). These represent requests that started but never finished, likely due to a crash. Mark them as failed and hand them to reconciliation for operator review. Do not blindly replay side effects; only retry if downstream correlation proves no completion happened.

Reaper. Deletes keys older than the TTL. Run on a schedule (hourly or daily) and delete in batches to avoid locking the table.

Reconciliation. Compares your internal ledger against the external provider's records. If the payment processor shows a charge that your ledger doesn't have, something went wrong in the write path. Run daily for payment systems.

Threat model

Baseline assumptions

Clients are untrusted: they can retry, replay, and send concurrent duplicates
The API authenticates callers and derives tenant context from the auth token, not the request body
Standard infrastructure controls (TLS, WAF, database AuthN) are in place. This model focuses on the idempotency mechanism itself
The key store is durable (database-backed for the reference design)
External side effects are delivered via an outbox worker, not inline during request handling

A note on risk: you won’t fix everything

This table isn’t a checklist where every row must be fully eliminated. Focus on preventing the worst failures and limiting blast radius. In practice: ship prevention for the High rows first, then add monitoring and response for what you can’t realistically prevent.

Asset	Threat	Baseline Controls	Mitigation Options	Risk
Request processing	Concurrent first-writer race: two requests with the same key arrive on different servers, both pass the existence check, both process	Shared durable key store	1. Atomic reservation: single `INSERT ON CONFLICT` with no check-then-act gap 2. `locked_at` timestamp with `409` + `Retry-After` for concurrent arrivals 3. Database UNIQUE constraint as final safety net	High
Cached responses	Cross-tenant key collision: two tenants generate the same key value, one receives the other's cached response	Auth required on all endpoints	1. Composite unique constraint: `(tenant_id, idempotency_key)` 2. Key lookup always includes tenant context from auth token 3. Return 404 if key belongs to a different tenant	High
Data integrity	Parameter mismatch: same key sent with different request body, server returns wrong cached result	Key existence check	1. Store request fingerprint (SHA-256, canonicalized) with the key 2. Compare fingerprint on retry; return 422 if mismatched	High
Financial integrity	Partial failure: external service processes the request but the local write fails, next retry double-processes	Durable key store	1. Write-first: save intent + key in one DB transaction before calling external service 2. Outbox pattern: publish external calls from durable queue 3. Reconciliation: compare external records to internal ledger daily	High
Payment flow integrity	Incomplete coverage: some endpoints in a multi-step flow enforce idempotency, others don't	Per-endpoint implementation	1. Require `Idempotency-Key` on all state-changing endpoints 2. Return 400 if a POST is missing the header 3. Audit: enumerate all state-changing endpoints and verify each enforces the header	High
Request processing path	Fail-open on degraded store: key store unavailable, handler bypasses reservation and processes the request anyway	Required header, durable key store	1. Fail closed: reject state-changing requests if reservation or lookup cannot complete 2. Alert on reservation failure rate and store health 3. Circuit-break state-changing endpoints when reservation path is unhealthy	High
Outbox worker / downstream execution	Worker replay: worker retries the same durable intent multiple times due to retries, poison-pill loops, or missing downstream correlation	Durable outbox, retryable worker	1. Carry the idempotency key or correlation ID into downstream provider calls 2. Record downstream object / provider ID before marking outbox item complete 3. Dead-letter repeatedly failing items after bounded retries 4. Alert on dead-letter queue depth	High
Cached responses	Key enumeration: attacker guesses key values to probe for cached responses	UUID v4 (128-bit entropy)	1. Composite lookup requires matching `tenant_id` 2. Rate-limit key lookups per caller	Low
Request integrity	Replay after TTL: attacker intercepts a request, replays it after the key expires to re-trigger processing	Key TTL	1. Request-level timestamps: reject requests older than a threshold 2. Bind key to session or auth token, not just tenant 3. Shorter TTL for high-value operations	Medium
Availability	Error caching: 500 response cached, retries return stale error after the bug is fixed	Response storage	1. Only cache 2xx and 4xx responses; let 5xx be retried fresh 2. If you cache all responses, provide a manual cache-bust for operations teams	Medium
Retry safety	TTL race: key expires while the client is still retrying, next retry treated as a new request	TTL policy	1. Set TTL longer than your longest realistic retry window 2. Return a header indicating key expiry time 3. For multi-day flows, extend TTL or use a separate tracking mechanism	Medium
Key store capacity	Key spraying: attacker creates thousands of keys per second, exhausting storage	Auth required	1. Rate-limit key creation per tenant 2. Reaper job deletes expired keys on schedule 3. Cap active keys per tenant	Low
Response freshness	Stale cache: cached "success" for a resource later cancelled or reversed	Immutable response cache	1. Cache the response as-is; don't sync with downstream state changes 2. Idempotency cache answers "did this request already run?" not "what is the current state?" 3. Clients needing current state call a separate GET endpoint	Low

FAQs

What should an idempotency key be scoped to?

Scope the key to the authenticated caller, operation, and request fingerprint. The same key reused with a different payload should fail rather than silently processing a different operation.

Should idempotency keys be stored in Redis or the primary database?

Use the primary database when the operation changes important business state, especially money, orders, provisioning, or account changes. Redis can be useful as a cache, but it cannot replace a transaction that reserves the key, writes business state, and enqueues side effects atomically.

Verification checklist

Key reservation and tenant isolation
- Send two identical POST requests simultaneously; only one creates the resource, both return the same response
- Same key with different request body returns 422
- Two different tenants send the same key value; each gets their own independent result, no cached-response bleed
- Same key and payload but different auth/session context: verify lookup is scoped correctly
- Keys have (tenant_id, idempotency_key) as the unique constraint
Concurrent and distributed behavior
- Concurrent requests with the same key from different app nodes: only one path performs side effects
- 409 Conflict response includes Retry-After header
- POST request without Idempotency-Key header returns 400
Failure handling and resilience
- Crash the worker after the external side effect succeeds but before local completion is recorded; retry and reconciliation do not create a duplicate side effect
- Bring the key store down; endpoint fails closed before any business mutation or external call
- First request returns 500; retry re-attempts processing instead of returning cached 500
- Every state-changing endpoint in a multi-step flow either enforces Idempotency-Key or rejects the request
Lifecycle
- Retry after TTL expiry creates a new request and does not silently duplicate the original side effect
- Reaper job deletes expired keys without locking the table for production traffic
Observability and detection
- Duplicate/replay metrics and alerts exist
- Duplicate request rate is tracked per tenant (spikes may indicate client bugs or replay attacks)
- Stale-processing sweeper resolves stuck keys within the lock timeout window
- Cached response for tenant A is not accessible by tenant B

Implementation & Review

The full threat model matrix, architectural diagrams, and a printable verification checklist for this pattern are available in the Secure Patterns repository. Use these artifacts to guide your design reviews and internal audits.

Designing API Idempotency Keys to Prevent Duplicate Writes