When an API call that changes state fails mid-flight (timeout, dropped connection), the client cannot tell whether the server processed it. The client retries. If the server did process the first request, the retry creates a duplicate: a double charge, a second order, an extra notification. Idempotency keys give each request a unique identifier so the server can recognize retries and return the original result instead of processing again. This post walks through the key lifecycle, the write-first pattern, and where things break.
System description
An API receives a state-changing request with an Idempotency-Key header, reserves the key in a durable store before doing any work, writes the business state and a pending job in a single transaction, then hands off external side effects to a background worker. On a valid retry for the same request, the server returns the cached result instead of reprocessing.

Architecture choice
The safety trade-offs depend on whether your key store can participate in the same transaction as your business logic.
Database-backed (PostgreSQL, MySQL): reference default
The key store lives in the same database as your application data. Reservation, business write, and outbox entry all go into a single transaction: if one fails, all fail.
Use this when:
You process payments or other operations where a partial failure means real money lost
You need the key reservation and the business write to be atomic
Your request volume fits within your database's write capacity
Trade-off: Requires connection pooling at scale, and write throughput is bounded by your database. A cache layer cannot give you that.
Cache-backed (Redis, DynamoDB): performance compromise
The key store is a fast, TTL-aware cache separate from your application database.
Use this when:
Your operations are naturally idempotent or low-stakes (e.g., toggling a flag, sending a non-critical notification)
You can tolerate a small window where a duplicate slips through
Built-in TTL management matters more than transactional consistency
Trade-off: Redis without persistence loses all keys on restart. Replication lag can cause two nodes to accept the same key independently. During failover, the promoted replica may be missing recent writes. In a multi-node setup, split-brain scenarios let both partitions accept requests. These are the same conditions that cause the retries you are trying to deduplicate.
Common middle ground: PostgreSQL for key reservation and result storage (same transaction as business logic), with a read-through Redis cache in front for fast lookups on retries.
Golden path
Build this first. Then relax constraints only if you have a specific reason:
Request arrives with Idempotency-Key header → reserve key → compare request fingerprint → write business state + pending job in one transaction → background worker calls external service → store completion result → return cached result on retryEach step is a gate. If reservation fails because the key already exists, the server skips processing and returns the stored result.
Minimal system context
Client (caller): generates the idempotency key and sends it with every state-changing request
API handler (control plane): validates the key, reserves it, orchestrates processing
Key store (reservation): a durable table in the same database as application state. Holds the key, fingerprint, lock timestamp, and cached response
Outbox (delivery guarantee): rows written atomically alongside the business state. The async worker reads these
Async worker (side-effect executor): pulls from the outbox, calls external services, records completion
External service (data plane): payment processor, notification provider, etc. Anything outside your transaction boundary
Core design
Key format and scope
The IETF draft standard defines the Idempotency-Key header as a string value, commonly a UUID v4 or another high-entropy opaque string. The key is generated by the client and sent with every request that changes state.
The key must be scoped to the caller. In a multi-tenant system, the unique constraint in your database should be (tenant_id, idempotency_key), not just (idempotency_key). Without tenant scoping, two different customers who happen to generate the same UUID would collide: one gets the other's cached response.
Minimum schema for the key store:
idempotency_key: the client-provided value (max 255 chars)tenant_id: scoping boundary (foreign key)request_fingerprint: hash of the request body (for mismatch detection)locked_at: timestamp (for concurrent request handling)response_code: cached HTTP statusresponse_body: cached response (JSONB)created_at: for TTL enforcement
The reservation (first-write lock)
The reservation is the most important step. Before the server does any work, it tries to insert the key into the store. If the insert succeeds, this is a new request and processing begins. If the insert fails because the key already exists, this is a retry and the server returns the cached result.
The insert must be a single atomic operation. The classic bug is a two-step check: first query whether the key exists, then insert if it doesn't. Two requests arriving milliseconds apart both pass the check, both insert, both process. The fix is a single statement that checks and inserts in one operation:
INSERT INTO idempotency_keys (tenant_id, idempotency_key, request_fingerprint, locked_at)
VALUES ($1, $2, $3, NOW())
ON CONFLICT (tenant_id, idempotency_key) DO NOTHING
RETURNING id;
If this returns a row, the key is new and the server holds the lock. If it returns nothing, the key already exists. One database round-trip, no race window.
For concurrent requests with the same key, a common implementation uses a locked_at timestamp. The first request acquires the lock. Concurrent requests see the lock and return 409 Conflict with a Retry-After header, telling the client to back off. Without Retry-After, poorly written clients hammer the endpoint in a tight loop while the lock is still held, turning normal retries into unnecessary load on the reservation path.
Request fingerprinting
When a retry arrives, the server needs to verify that the retry is actually the same request, not a different request reusing the same key by accident.
Store a hash of the request parameters (SHA-256) alongside the key. On retry, compute the hash again and compare. If they match, return the cached result. If they don't, reject with a 4xx error (commonly 422). AWS returns IdempotentParameterMismatch for the same case.
Fingerprints should be computed from a canonical representation of the business parameters, not the raw HTTP body bytes. Two logically identical retries can serialize JSON differently due to key ordering, whitespace, or library differences. Sort keys or normalize fields before hashing so equivalent retries do not fail due to serialization variance.
Without fingerprinting, a client that accidentally reuses a key with a different amount or different recipient gets back a stale response from the original request. In a payment system, that means charging $10 when the intent was $20, or sending money to the wrong account.
Partial failure and the write-first pattern
The hardest problem in idempotency is partial failure. Consider this sequence:
Server receives request with idempotency key
Server calls payment processor, card is charged
Server tries to write the idempotency result to the database
Database write fails (connection lost, disk full, timeout)
The charge happened, but the server has no record of it. The client retries, and the server processes it as a new request. Double charge.
The fix is to reverse the order: write first, call the external service second. Save the intent and the idempotency key in the same database transaction before calling the payment processor. Then publish the external call from a durable outbox table. A background worker reads the outbox and sends the request to the payment processor. If the worker fails, it retries from the outbox. The database transaction is the source of truth.
This is Brandur Leach's "atomic phases" pattern. The key insight: your database transaction succeeds or fails as a unit. External calls happen after the transaction commits, driven by an outbox that guarantees delivery. If the downstream provider supports idempotency, carry the same key or correlation ID into those calls. Your local transaction prevents duplicate intent creation; provider-side idempotency prevents duplicate external execution when the worker retries.
A related failure mode is partial coverage. The WooCommerce-Stripe plugin had a bug where idempotency keys were set for the /charges endpoint but not for /payment_intents. Subscription renewals created duplicate payment intents on retry because the second API call in the sequence had no idempotency protection. If a multi-step flow has idempotency on some endpoints but not others, the unprotected endpoints become the failure point.
What to store and for how long
Store the HTTP status code and response body alongside the key. On retry, return exactly what the original request returned.
Some implementations cache all responses, including 500 errors. The risk: if you cache a server error, every retry returns the cached error even after the underlying bug is fixed. A safer default is to only cache 2xx and 4xx responses. Let 5xx responses be retried fresh, since server errors are usually transient.
TTL depends on your use case. Common values range from 24 hours to 30 days. After the TTL expires, the key is eligible for cleanup and the same key value can create a new request. Pick a TTL that covers your longest realistic retry window.
Background workers
Three background processes keep the system healthy:
Stale-processing sweeper. Finds keys stuck in "processing" (where locked_at is older than a threshold, say 15 minutes). These represent requests that started but never finished, likely due to a crash. Mark them as failed and hand them to reconciliation for operator review. Do not blindly replay side effects; only retry if downstream correlation proves no completion happened.
Reaper. Deletes keys older than the TTL. Run on a schedule (hourly or daily) and delete in batches to avoid locking the table.
Reconciliation. Compares your internal ledger against the external provider's records. If the payment processor shows a charge that your ledger doesn't have, something went wrong in the write path. Run daily for payment systems.
Threat model
Baseline assumptions
Clients are untrusted: they can retry, replay, and send concurrent duplicates
The API authenticates callers and derives tenant context from the auth token, not the request body
Standard infrastructure controls (TLS, WAF, database AuthN) are in place. This model focuses on the idempotency mechanism itself
The key store is durable (database-backed for the reference design)
External side effects are delivered via an outbox worker, not inline during request handling
A note on risk: you won’t fix everything
This table isn’t a checklist where every row must be fully eliminated. Focus on preventing the worst failures and limiting blast radius. In practice: ship prevention for the High rows first, then add monitoring and response for what you can’t realistically prevent.
Asset | Threat | Baseline Controls | Mitigation Options | Risk |
|---|---|---|---|---|
Request processing | Concurrent first-writer race: two requests with the same key arrive on different servers, both pass the existence check, both process | Shared durable key store | 1. Atomic reservation: single 2. 3. Database UNIQUE constraint as final safety net | High |
Cached responses | Cross-tenant key collision: two tenants generate the same key value, one receives the other's cached response | Auth required on all endpoints | 1. Composite unique constraint: 2. Key lookup always includes tenant context from auth token 3. Return 404 if key belongs to a different tenant | High |
Data integrity | Parameter mismatch: same key sent with different request body, server returns wrong cached result | Key existence check | 1. Store request fingerprint (SHA-256, canonicalized) with the key 2. Compare fingerprint on retry; return 422 if mismatched | High |
Financial integrity | Partial failure: external service processes the request but the local write fails, next retry double-processes | Durable key store | 1. Write-first: save intent + key in one DB transaction before calling external service 2. Outbox pattern: publish external calls from durable queue 3. Reconciliation: compare external records to internal ledger daily | High |
Payment flow integrity | Incomplete coverage: some endpoints in a multi-step flow enforce idempotency, others don't | Per-endpoint implementation | 1. Require 2. Return 400 if a POST is missing the header 3. Audit: enumerate all state-changing endpoints and verify each enforces the header | High |
Request processing path | Fail-open on degraded store: key store unavailable, handler bypasses reservation and processes the request anyway | Required header, durable key store | 1. Fail closed: reject state-changing requests if reservation or lookup cannot complete 2. Alert on reservation failure rate and store health 3. Circuit-break state-changing endpoints when reservation path is unhealthy | High |
Outbox worker / downstream execution | Worker replay: worker retries the same durable intent multiple times due to retries, poison-pill loops, or missing downstream correlation | Durable outbox, retryable worker | 1. Carry the idempotency key or correlation ID into downstream provider calls 2. Record downstream object / provider ID before marking outbox item complete 3. Dead-letter repeatedly failing items after bounded retries 4. Alert on dead-letter queue depth | High |
Cached responses | Key enumeration: attacker guesses key values to probe for cached responses | UUID v4 (128-bit entropy) | 1. Composite lookup requires matching 2. Rate-limit key lookups per caller | Low |
Request integrity | Replay after TTL: attacker intercepts a request, replays it after the key expires to re-trigger processing | Key TTL | 1. Request-level timestamps: reject requests older than a threshold 2. Bind key to session or auth token, not just tenant 3. Shorter TTL for high-value operations | Medium |
Availability | Error caching: 500 response cached, retries return stale error after the bug is fixed | Response storage | 1. Only cache 2xx and 4xx responses; let 5xx be retried fresh 2. If you cache all responses, provide a manual cache-bust for operations teams | Medium |
Retry safety | TTL race: key expires while the client is still retrying, next retry treated as a new request | TTL policy | 1. Set TTL longer than your longest realistic retry window 2. Return a header indicating key expiry time 3. For multi-day flows, extend TTL or use a separate tracking mechanism | Medium |
Key store capacity | Key spraying: attacker creates thousands of keys per second, exhausting storage | Auth required | 1. Rate-limit key creation per tenant 2. Reaper job deletes expired keys on schedule 3. Cap active keys per tenant | Low |
Response freshness | Stale cache: cached "success" for a resource later cancelled or reversed | Immutable response cache | 1. Cache the response as-is; don't sync with downstream state changes 2. Idempotency cache answers "did this request already run?" not "what is the current state?" 3. Clients needing current state call a separate GET endpoint | Low |
Verification checklist
Key reservation and tenant isolation
Send two identical POST requests simultaneously; only one creates the resource, both return the same response
Same key with different request body returns 422
Two different tenants send the same key value; each gets their own independent result, no cached-response bleed
Same key and payload but different auth/session context: verify lookup is scoped correctly
Keys have
(tenant_id, idempotency_key)as the unique constraint
Concurrent and distributed behavior
Concurrent requests with the same key from different app nodes: only one path performs side effects
409 Conflictresponse includesRetry-AfterheaderPOST request without
Idempotency-Keyheader returns 400
Failure handling and resilience
Crash the worker after the external side effect succeeds but before local completion is recorded; retry and reconciliation do not create a duplicate side effect
Bring the key store down; endpoint fails closed before any business mutation or external call
First request returns 500; retry re-attempts processing instead of returning cached 500
Every state-changing endpoint in a multi-step flow either enforces
Idempotency-Keyor rejects the request
Lifecycle
Retry after TTL expiry creates a new request and does not silently duplicate the original side effect
Reaper job deletes expired keys without locking the table for production traffic
Observability and detection
Duplicate/replay metrics and alerts exist
Duplicate request rate is tracked per tenant (spikes may indicate client bugs or replay attacks)
Stale-processing sweeper resolves stuck keys within the lock timeout window
Cached response for tenant A is not accessible by tenant B
Implementation & Review
The full threat model matrix, architectural diagrams, and a printable verification checklist for this pattern are available in the Secure Patterns repository. Use these artifacts to guide your design reviews and internal audits.
