Securing SaaS OAuth Integrations: Threat Model & Architecture

When your SaaS calls third-party APIs on behalf of customers, you store their OAuth access and refresh tokens. You are a credential custodian: if your token store is breached, attackers reach not just your data but your customers' data on every connected platform. This post documents the token vault pattern, a centralized token service that owns all credential storage, decryption, refresh, and outbound mediation, so that application code never touches raw third-party tokens.

System description

A centralized token service encrypts and stores OAuth credentials per tenant, refreshes them under lock, and makes outbound API calls on behalf of application code. Application code sends requests in authenticated tenant context, specifying integration_id; it never receives, caches, or persists credentials itself.

After the initial OAuth consent exchange, the API passes tokens to the token service for encrypted storage. At runtime, every outbound call using customer credentials flows through the token service:

Application code never has a direct arrow to the third-party API.

Golden path

Build this first. Then relax constraints only if you have a specific reason:

Authenticate caller → derive tenant from auth context → resolve credential in token service → refresh under lock → make outbound call via token service → return response without credential

Each step is a gate. If the credential is missing, the lock cannot be acquired, or the provider rejects a refresh, the outbound call fails and the tenant is notified.

If you outsource token custody to an external broker, you are delegating this pattern rather than building it. Evaluate the broker's encryption, tenant isolation, and incident response posture the way you would evaluate your own. The trade-off shifts from "you own the implementation" to "you depend on a third party for every outbound call".

Related patterns:

If your token service also makes outbound calls to customer-controlled URLs, pair this with Secure Webhook Delivery: Signing, Verification, and SSRF Prevention for egress controls and destination validation.
If agents or tools use these third-party credentials, put the token service behind an authorization layer like AI Agent Gateway: The Authorization Chokepoint.
If workloads need cloud access, prefer Kubernetes Workload Identity: Eliminating Static Cloud Credentials over long-lived cloud keys.

Minimal system context

API / control plane (authorization): authenticates tenants, handles OAuth consent flow, dispatches outbound requests to the token service
Token service (credential management, outbound mediation): stores, decrypts, refreshes, and uses third-party tokens
Encrypted token store (data plane): database table holding encrypted token records, scoped by tenant_id
KMS / HSM (key management): manages key encryption keys; the token service calls KMS to unwrap per-tenant data encryption keys
Distributed lock backend (refresh serialization): Redis, DB advisory locks, or equivalent. Prevents concurrent refresh of the same token
Cleanup worker (revocation): periodic job that catches zombie tokens for disabled or churned tenants

Core design

Token service (credential management and outbound mediation)

Application code calls two functions:

execute_request(auth_context, integration_id, request_spec): The primary interface. The token service extracts tenant_id from the caller's verified auth context, resolves the credential for this tenant and integration, refreshes under lock if expired, injects the Authorization header, makes the outbound HTTP call, and returns the response. Application code sends a request description (method, path, body); it never sees the token
store_token(auth_context, integration_id, token_response): Called after a successful OAuth exchange. Extracts tenant_id from auth context, encrypts the access and refresh tokens with the tenant's DEK, and persists the record

Large payloads: The execute_request proxy pattern is designed for standard API calls (JSON request / response). For large file uploads or bulk data exports, stream request and response bodies end-to-end without buffering, or use a dedicated network proxy (Envoy, nginx) where the token service supplies credentials at the edge rather than acting as the data plane itself.

Encrypted token store (data plane)

A database table holding encrypted token records. Minimum fields:

integration_id: Unique per tenant-provider connection
tenant_id: The isolation boundary (immutable after creation, included in every query)
provider: The OAuth provider (e.g., salesforce, google, slack)
encrypted_access_token, encrypted_refresh_token: Ciphertext (AES-256-GCM)
encrypted_dek: The data encryption key, wrapped by the tenant's KEK
scopes: The granted scope string (stored to detect drift)
expires_at, created_at, revoked_at

Envelope encryption keeps raw tokens out of the database:

Each tenant gets a data encryption key (DEK), an AES-256-GCM symmetric key generated at tenant provisioning
The DEK is encrypted by a key encryption key (KEK) in cloud KMS or HSM. The wrapped DEK is stored alongside the token record
To decrypt a token, the token service calls KMS to unwrap the DEK, then decrypts the token ciphertext

KMS handles KEK rotation automatically: old ciphertexts stay decryptable, new encryptions use the latest key version. The token service never sees the KEK in plaintext. On integration disconnect, any cached plaintext DEK for that tenant-integration pair must be evicted before the next request cycle.

Threat model

Baseline assumptions

Your SaaS authenticates tenants and derives tenant_id from verified credentials, not request parameters
OAuth providers implement RFC 6749 correctly (authorization code flow, token endpoint, revocation endpoint)
Tokens are bearer credentials: anyone holding a valid access token can use it. Token binding (DPoP, mTLS) is not yet widely supported by third-party providers, so the architecture does not assume it
Standard OAuth flow hardening (exact-match redirect URIs, PKCE, CSRF-bound state parameter) is in place. This model focuses on what happens after tokens are acquired
Standard infra controls (TLS, WAF, database AuthN, SQLi prevention) are in place

A note on risk: you won’t fix everything

This table isn’t a checklist where every row must be fully eliminated. Focus on preventing the worst failures and limiting blast radius. In practice: ship prevention for the High rows first, then add monitoring and response for what you can’t realistically prevent.

Phase 1: Token storage and lifecycle

Focus: Preventing token exposure, enforcing tenant isolation, and managing credential lifecycle

Asset	Threat	Baseline Controls	Mitigation Options	Risk
Token store	Bulk exposure: Database breach, backup leak, or snapshot copy exposes all tokens	Database access controls	1. Envelope encryption: per-tenant DEK wrapped by KEK in KMS / HSM 2. Per-tenant keys: a single DEK compromise limits blast radius to one tenant 3. No plaintext path: verify no token value appears in logs, error messages, monitoring, or backups	High
Tenant isolation	Cross-tenant token access: Application bug or IDOR allows one tenant's code path to use another tenant's credential through the token service	Auth context	1. Tenant filter: every token service lookup includes `WHERE tenant_id = ?` with `tenant_id` from verified auth context, never from request parameters 2. Opaque errors: cross-tenant lookups return "integration not found", never "access denied"	High
Stored scopes	Scope overreach: Stored tokens carry broader scopes than the integration uses, so a vault compromise gives attackers more access than the feature requires	OAuth consent screen	1. Scope inventory: document required scopes per integration and compare against stored grants 2. Scope drift detection: compare stored `scopes` against provider response on each refresh, alert on unexpected expansion	Medium
Refresh token	Rotation race: Two concurrent requests trigger refresh simultaneously. The provider rotates the refresh token on first use; the second caller sends a stale token, and the provider revokes the entire token family	Centralized refresh	1. Distributed lock: acquire a per-integration lock before refreshing 2. Wait and re-read: if the lock is held, back off and read the (likely already refreshed) token 3. Atomic store: persist the new refresh token before releasing the lock 4. Fail closed: if the lock backend is unavailable, reject the outbound request rather than refreshing without coordination	Low
Encryption keys	KEK compromise: Attacker gains KMS access, making all DEK encryption ineffective	Cloud KMS access policies	1. Least privilege: restrict KMS decrypt to the token service's IAM role only 2. Audit: alert on decrypt calls from unexpected principals or unusual volume 3. Automatic KEK rotation in KMS	Medium
Token lifecycle	Zombie tokens: Tenant disconnects an integration or churns, but tokens persist and remain usable at the provider	Manual cleanup	1. Revoke at provider: call the revocation endpoint (RFC 7009) on disconnect 2. Delete local: purge the encrypted record and evict any cached plaintext 3. Sweep: periodic job catches tokens the disconnect flow missed	Low

Phase 2: Outbound token use

Focus: Preventing credential misrouting and leakage when the token service makes API calls on behalf of tenants

Asset	Threat	Baseline Controls	Mitigation Options	Risk
Outbound request	Confused deputy: Bug in the token service resolves the wrong tenant's credential for an outbound call, making an API request using another customer's token	Tenant context binding	1. Strict lookup: `execute_request` resolves credentials by `tenant_id` + `integration_id`; both must match the same record 2. No fallback: if the lookup returns no match, fail the request, never try a broader search 3. Audit: log `tenant_id`, `integration_id`, and target URL for every outbound call	High
Token service	Outage: The token service is unavailable, blocking all outbound integrations for all tenants	Redundant deployment	1. High availability: deploy the token service with replica count and health checks matching your SLA 2. Graceful degradation: application code receives a clear "integration unavailable" error, not a timeout 3. No bypass: application code must not cache or store tokens as a fallback when the service is down	Medium
Access tokens	Credential leakage: Access token appears in request logs, error stack traces, monitoring dashboards, or crash dumps from the token service process	Standard logging, short-lived cache	1. Header redaction: strip `Authorization` headers from all log outputs 2. Structured logging: field-level redaction rules in the logging framework 3. Short TTL: cache entries expire in minutes, evict on disconnect 4. Process isolation: restricted core dump settings for the token service	Medium

FAQs

Where should a SaaS store customer OAuth refresh tokens?

Store refresh tokens in a centralized token service, not scattered across application services. The token service should own encryption, refresh, audit logging, and outbound mediation so application code never handles raw credentials.

Should OAuth tokens be encrypted per tenant?

Yes. Use envelope encryption with tenant-scoped data encryption keys where possible. Per-tenant keys reduce blast radius if a token record, backup, or application path is exposed.

Verification checklist

Token encryption
- Each tenant's tokens are encrypted with a distinct DEK
- KMS decrypt permissions are restricted to the token service's IAM role
- Decrypting a token record with a different tenant's DEK fails
Tenant isolation
- Querying for a token with a valid integration_id but wrong tenant_id returns "not found"
- Every query path includes WHERE tenant_id = ? with tenant_id from verified auth context
- Cross-tenant lookups return identical responses whether the integration exists or not
Token lifecycle
- Two concurrent requests for the same expired token result in exactly one provider refresh call
- If the lock backend is unavailable, refresh fails closed and the outbound request is rejected
- The new refresh token is persisted atomically before the lock is released
- A failed refresh marks the integration as degraded and alerts the tenant
- Stored scopes are compared on each refresh; scope changes trigger an alert
- Disconnecting an integration revokes at the provider and deletes the local record
- A periodic sweep catches zombie tokens for disabled or churned tenants
Outbound safety
- The execute_request response contains the third-party API's response, not the credential used
- Token values do not appear in application logs, error messages, or monitoring dashboards
- Every outbound call is logged with tenant_id, integration_id, target host, and HTTP status
Detection
- KMS throttling or timeout causes a controlled failure and alert, not unbounded retries
- Alerts fire on: refresh failure rate exceeding threshold, KMS decrypt calls from unexpected principals, lock backend health degradation
- Token usage logs support investigating "which tenant's credentials were used to call which API at what time"

Implementation & Review

The full threat model matrix, architectural diagrams, and a printable verification checklist for this pattern are available in the Secure Patterns repository. Use these artifacts to guide your design reviews and internal audits.

OAuth Token Storage: Securing Third-Party Credentials in Multi-Tenant SaaS