When your SaaS calls third-party APIs on behalf of customers, you store their OAuth access and refresh tokens. You are a credential custodian: if your token store is breached, attackers reach not just your data but your customers' data on every connected platform. This post documents the token vault pattern, a centralized token service that owns all credential storage, decryption, refresh, and outbound mediation, so that application code never touches raw third-party tokens.
System description
A centralized token service encrypts and stores OAuth credentials per tenant, refreshes them under lock, and makes outbound API calls on behalf of application code. Application code sends requests in authenticated tenant context, specifying integration_id; it never receives, caches, or persists credentials itself.
After the initial OAuth consent exchange, the API passes tokens to the token service for encrypted storage. At runtime, every outbound call using customer credentials flows through the token service:

Application code never has a direct arrow to the third-party API.
Golden path
Build this first. Then relax constraints only if you have a specific reason:
Authenticate caller → derive tenant from auth context → resolve credential in token service → refresh under lock → make outbound call via token service → return response without credentialEach step is a gate. If the credential is missing, the lock cannot be acquired, or the provider rejects a refresh, the outbound call fails and the tenant is notified.
If you outsource token custody to an external broker, you are delegating this pattern rather than building it. Evaluate the broker's encryption, tenant isolation, and incident response posture the way you would evaluate your own. The trade-off shifts from "you own the implementation" to "you depend on a third party for every outbound call".
Minimal system context
API / control plane (authorization): authenticates tenants, handles OAuth consent flow, dispatches outbound requests to the token service
Token service (credential management, outbound mediation): stores, decrypts, refreshes, and uses third-party tokens
Encrypted token store (data plane): database table holding encrypted token records, scoped by
tenant_idKMS / HSM (key management): manages key encryption keys; the token service calls KMS to unwrap per-tenant data encryption keys
Distributed lock backend (refresh serialization): Redis, DB advisory locks, or equivalent. Prevents concurrent refresh of the same token
Cleanup worker (revocation): periodic job that catches zombie tokens for disabled or churned tenants
Core design
Token service (credential management and outbound mediation)
Application code calls two functions:
execute_request(auth_context, integration_id, request_spec): The primary interface. The token service extractstenant_idfrom the caller's verified auth context, resolves the credential for this tenant and integration, refreshes under lock if expired, injects theAuthorizationheader, makes the outbound HTTP call, and returns the response. Application code sends a request description (method, path, body); it never sees the tokenstore_token(auth_context, integration_id, token_response): Called after a successful OAuth exchange. Extractstenant_idfrom auth context, encrypts the access and refresh tokens with the tenant's DEK, and persists the record
Large payloads: The execute_request proxy pattern is designed for standard API calls (JSON request / response). For large file uploads or bulk data exports, stream request and response bodies end-to-end without buffering, or use a dedicated network proxy (Envoy, nginx) where the token service supplies credentials at the edge rather than acting as the data plane itself.
Encrypted token store (data plane)
A database table holding encrypted token records. Minimum fields:
integration_id: Unique per tenant-provider connectiontenant_id: The isolation boundary (immutable after creation, included in every query)provider: The OAuth provider (e.g.,salesforce,google,slack)encrypted_access_token,encrypted_refresh_token: Ciphertext (AES-256-GCM)encrypted_dek: The data encryption key, wrapped by the tenant's KEKscopes: The granted scope string (stored to detect drift)expires_at,created_at,revoked_at
Envelope encryption keeps raw tokens out of the database:
Each tenant gets a data encryption key (DEK), an AES-256-GCM symmetric key generated at tenant provisioning
The DEK is encrypted by a key encryption key (KEK) in cloud KMS or HSM. The wrapped DEK is stored alongside the token record
To decrypt a token, the token service calls KMS to unwrap the DEK, then decrypts the token ciphertext
KMS handles KEK rotation automatically: old ciphertexts stay decryptable, new encryptions use the latest key version. The token service never sees the KEK in plaintext. On integration disconnect, any cached plaintext DEK for that tenant-integration pair must be evicted before the next request cycle.
Threat model
Baseline assumptions
Your SaaS authenticates tenants and derives
tenant_idfrom verified credentials, not request parametersOAuth providers implement RFC 6749 correctly (authorization code flow, token endpoint, revocation endpoint)
Tokens are bearer credentials: anyone holding a valid access token can use it. Token binding (DPoP, mTLS) is not yet widely supported by third-party providers, so the architecture does not assume it
Standard OAuth flow hardening (exact-match redirect URIs, PKCE, CSRF-bound
stateparameter) is in place. This model focuses on what happens after tokens are acquiredStandard infra controls (TLS, WAF, database AuthN, SQLi prevention) are in place
A note on risk: you won’t fix everything
This table isn’t a checklist where every row must be fully eliminated. Focus on preventing the worst failures and limiting blast radius. In practice: ship prevention for the High rows first, then add monitoring and response for what you can’t realistically prevent.
Phase 1: Token storage and lifecycle
Focus: Preventing token exposure, enforcing tenant isolation, and managing credential lifecycle
Asset | Threat | Baseline Controls | Mitigation Options | Risk |
|---|---|---|---|---|
Token store | Bulk exposure: Database breach, backup leak, or snapshot copy exposes all tokens | Database access controls | 1. Envelope encryption: per-tenant DEK wrapped by KEK in KMS / HSM 2. Per-tenant keys: a single DEK compromise limits blast radius to one tenant 3. No plaintext path: verify no token value appears in logs, error messages, monitoring, or backups | High |
Tenant isolation | Cross-tenant token access: Application bug or IDOR allows one tenant's code path to use another tenant's credential through the token service | Auth context | 1. Tenant filter: every token service lookup includes 2. Opaque errors: cross-tenant lookups return "integration not found", never "access denied" | High |
Stored scopes | Scope overreach: Stored tokens carry broader scopes than the integration uses, so a vault compromise gives attackers more access than the feature requires | OAuth consent screen | 1. Scope inventory: document required scopes per integration and compare against stored grants 2. Scope drift detection: compare stored | Medium |
Refresh token | Rotation race: Two concurrent requests trigger refresh simultaneously. The provider rotates the refresh token on first use; the second caller sends a stale token, and the provider revokes the entire token family | Centralized refresh | 1. Distributed lock: acquire a per-integration lock before refreshing 2. Wait and re-read: if the lock is held, back off and read the (likely already refreshed) token 3. Atomic store: persist the new refresh token before releasing the lock 4. Fail closed: if the lock backend is unavailable, reject the outbound request rather than refreshing without coordination | Low |
Encryption keys | KEK compromise: Attacker gains KMS access, making all DEK encryption ineffective | Cloud KMS access policies | 1. Least privilege: restrict KMS decrypt to the token service's IAM role only 2. Audit: alert on decrypt calls from unexpected principals or unusual volume 3. Automatic KEK rotation in KMS | Medium |
Token lifecycle | Zombie tokens: Tenant disconnects an integration or churns, but tokens persist and remain usable at the provider | Manual cleanup | 1. Revoke at provider: call the revocation endpoint (RFC 7009) on disconnect 2. Delete local: purge the encrypted record and evict any cached plaintext 3. Sweep: periodic job catches tokens the disconnect flow missed | Low |
Phase 2: Outbound token use
Focus: Preventing credential misrouting and leakage when the token service makes API calls on behalf of tenants
Asset | Threat | Baseline Controls | Mitigation Options | Risk |
|---|---|---|---|---|
Outbound request | Confused deputy: Bug in the token service resolves the wrong tenant's credential for an outbound call, making an API request using another customer's token | Tenant context binding | 1. Strict lookup: 2. No fallback: if the lookup returns no match, fail the request, never try a broader search 3. Audit: log | High |
Token service | Outage: The token service is unavailable, blocking all outbound integrations for all tenants | Redundant deployment | 1. High availability: deploy the token service with replica count and health checks matching your SLA 2. Graceful degradation: application code receives a clear "integration unavailable" error, not a timeout 3. No bypass: application code must not cache or store tokens as a fallback when the service is down | Medium |
Access tokens | Credential leakage: Access token appears in request logs, error stack traces, monitoring dashboards, or crash dumps from the token service process | Standard logging, short-lived cache | 1. Header redaction: strip 2. Structured logging: field-level redaction rules in the logging framework 3. Short TTL: cache entries expire in minutes, evict on disconnect 4. Process isolation: restricted core dump settings for the token service | Medium |
Verification checklist
Token encryption
Each tenant's tokens are encrypted with a distinct DEK
KMS decrypt permissions are restricted to the token service's IAM role
Decrypting a token record with a different tenant's DEK fails
Tenant isolation
Querying for a token with a valid
integration_idbut wrongtenant_idreturns "not found"Every query path includes
WHERE tenant_id = ?withtenant_idfrom verified auth contextCross-tenant lookups return identical responses whether the integration exists or not
Token lifecycle
Two concurrent requests for the same expired token result in exactly one provider refresh call
If the lock backend is unavailable, refresh fails closed and the outbound request is rejected
The new refresh token is persisted atomically before the lock is released
A failed refresh marks the integration as degraded and alerts the tenant
Stored
scopesare compared on each refresh; scope changes trigger an alertDisconnecting an integration revokes at the provider and deletes the local record
A periodic sweep catches zombie tokens for disabled or churned tenants
Outbound safety
The
execute_requestresponse contains the third-party API's response, not the credential usedToken values do not appear in application logs, error messages, or monitoring dashboards
Every outbound call is logged with
tenant_id,integration_id, target host, and HTTP status
Detection
KMS throttling or timeout causes a controlled failure and alert, not unbounded retries
Alerts fire on: refresh failure rate exceeding threshold, KMS decrypt calls from unexpected principals, lock backend health degradation
Token usage logs support investigating "which tenant's credentials were used to call which API at what time"
Implementation & Review
The full threat model matrix, architectural diagrams, and a printable verification checklist for this pattern are available in the Secure Patterns repository. Use these artifacts to guide your design reviews and internal audits.
