Source systems have permissions. Vector indexes have data and metadata. If you don't carry authorization into ingestion and retrieval, the index becomes a second copy of your documents with weaker access control. In a RAG system, retrieval is the first authorization decision, not a relevance-only step. If retrieval runs before tenant, document, or field-level checks, the model can summarize data the caller was never allowed to see. This post documents a safe default for preserving source permissions through the retrieval layer and its trade-offs.

System description

A RAG access control system resolves caller identity, derives an authorization scope from verified credentials and policy, retrieves only chunks bound to that scope, and sends only approved chunks to the model. The source document system is the authority for permissions and lifecycle state. The retrieval index is a projection, not the source of truth.

RAG Architecture

Architecture choice

You can enforce source permissions in retrieval two ways. The security trade-offs change depending on which you pick.

Partition first

Each tenant, workspace, or classification tier gets a separate namespace, shard, or collection. Queries still enforce caller scope, but the first boundary is physical or logical partitioning.

Use this when:

  • Tenant isolation matters more than cross-tenant recall

  • You need lower blast radius from application bugs

  • Index size or workload patterns already justify separate partitions

Trade-off: Partitioning simplifies isolation but complicates index management, rebalancing, and shared-content workflows.

Filter within partition

Each query carries an authorization scope derived server-side, and retrieval applies it as a hard filter before ranking returns chunks. Document-level or chunk-level ACL metadata controls who sees what inside a partition.

Use this when:

  • Permissions are fine-grained (per-document or per-folder)

  • Users have overlapping access across many document collections

  • You need permission changes to propagate without re-partitioning the store

Main risks: Stale ACL metadata, weak scope derivation, and filter bypass in fallback paths.

Common middle ground: Partition by tenant or workspace first, then apply document-level ACL filters within that partition. Per-chunk ACL filters do not replace a tenant boundary.

Golden path

Build this first. Then relax constraints only if you have a specific reason:

Authenticate caller → derive principal server-side → resolve retrieval scope → query authorized partition → verify chunk freshness → send approved chunks to model → bind citations to chunk IDs → log scope and denials

Each step is a gate. If any gate cannot resolve (identity unavailable, scope ambiguous, freshness unverifiable), fail closed. Failing closed raises the denial rate during outages and reindex lag, but that cost is lower than silently widening retrieval scope.

Core design

RAG gateway (control plane)

Authenticates the caller and derives principal and group membership from the identity provider. From there, it builds the retrieval scope using the policy store and decides which chunks may reach the model. The gateway is the only component that constructs vector queries.

Policy resolver (authorization)

Maps verified identity to allowed tenants, workspaces, document classes, and optional field restrictions. The resolver derives scope, not the client request.

Retrieval index (data plane)

Stores chunk embeddings alongside security metadata. The index doesn't enforce application-level access control on its own, but it supports namespace partitioning and metadata filtering.

Minimum chunk metadata: tenant_id, doc_id, chunk_id, acl_principals or acl_groups, classification (public, internal, restricted), version (source document version at chunking time), state (active, deleted, revoked, pending_reindex).

Don't derive authorization from chunk text. The decision must come from structured metadata.

Source document store (system of record)

Canonical documents, ACLs, lifecycle state, and redaction state live here. The index is always a derived copy. When there's a conflict between source and index, the source wins.

Delegated identity for service-to-service callers

Backend services calling the RAG gateway with a machine identity must propagate end-user context. Authorizing the calling service isn't the same as authorizing the end user. Reject user-scoped queries from service identities that carry no delegation context.

ACL propagation and revocation

ACL changes, document deletion, and tenant offboarding must propagate to the index. The safe default is fail closed while reindex is pending: mark affected chunks as pending_reindex and exclude them from retrieval before background cleanup catches up.

Redaction

Chunk-level access control isn't enough if the chunk itself mixes allowed business text with restricted fields. Redact secret and restricted fields before embedding, or split them into separate chunks with appropriate classification metadata.

Session isolation

Bind session state to the current principal and tenant. Clear retrieved context on principal or workspace change.

Threat model

Baseline assumptions

  • Callers are untrusted: they can craft arbitrary prompts, retry requests, and probe for data outside their scope

  • Control plane authority: the RAG gateway derives user, tenant, and group context from verified identity, not from request body fields

  • The source document system is the authority for ACLs and lifecycle state; the retrieval index is a derived copy

  • The vector store supports hard filters, namespaces, or tenant-aware collections

  • The LLM does not understand permissions and will use any content in its context window

  • Standard infra controls such as TLS, WAF, secret management, and database AuthN are assumed to be in place. This model focuses on RAG retrieval authorization

A note on risk: you won’t fix everything

This table isn’t a checklist where every row must be fully eliminated. Focus on preventing the worst failures and limiting blast radius. In practice: ship prevention for the High rows first, then add monitoring and response for what you can’t realistically prevent.

Phase 1: Ingest and indexing

Focus: Preserving the source system's access boundary in retrieval metadata

Asset

Threat

Baseline Controls

Mitigation Options

Risk

Retrieval boundary

ACL drop: Ingest pipeline chunks and embeds a document but omits tenant or ACL metadata, leaving chunks globally searchable inside the index

Source ACL extraction

1. Required fields: Reject indexing if tenant_id or ACL metadata is missing

2. Schema validation: Validate chunk metadata in CI and at ingest time

3. Dead-letter queue: Hold malformed documents out of the searchable index

High

Authorization freshness

Stale grants: User loses access in the source system, but old ACL metadata in the index keeps returning those chunks

Periodic reindex cycle

1. Event-driven sync: Subscribe to ACL-change events where the source supports them

2. Differential polling: High-frequency comparison with a defined freshness SLA where events are unavailable

3. Fail closed: Mark affected documents pending_reindex and exclude from retrieval

High

Tenant isolation

Mixed-tenant chunks: Chunker or ETL job merges data from different tenants into one chunk or document record

Tenant ID in chunk metadata

1. Partition first: Chunk only within a single tenant or workspace boundary

2. Deterministic IDs: Carry tenant_id through every transform stage

3. Negative tests: Assert no chunk contains multiple tenant identifiers in source provenance

High

Source collection scope

Over-collection: Connector service identity can read or ingest document spaces that were never approved for retrieval, because the service account has overly broad source visibility or the sync scope is misconfigured

Source ACL extraction, connector auth

1. Scope connector identities: Limit connector service accounts to approved corpora only

2. Allow-list indexed sources: Maintain an explicit allow-list of indexed sources, spaces, or folders

3. Audit before first ingest: Review newly discovered sources before they enter the index

Medium

Deleted content

Zombie chunks: Source document is deleted or revoked, but old chunks remain queryable until the next batch rebuild

Document lifecycle state

1. Tombstones: Push delete / revoke events into the index immediately, marking chunks as non-retrievable

2. State filter: Exclude deleted, revoked, and pending_reindex chunks at query time

3. Short rebuild windows: Keep batch lag within your documented freshness SLA

Medium

Field-level secrets

Chunk overexposure: A chunk contains both allowed business text and restricted fields because chunking happened before redaction

Classification metadata

1. Redact before embedding: Remove secret and restricted fields upstream of the chunker

2. Sensitivity-aware chunking: Split restricted fields into separate chunks with appropriate classification

3. Classification filters: Exclude restricted classes for broad roles

Medium

Phase 2: Query-time retrieval

Focus: Preventing unauthorized chunks from entering the prompt

Asset

Threat

Baseline Controls

Mitigation Options

Risk

Tenant data

Filter bypass: Service runs semantic search across the whole corpus, then applies ACL checks after top-k retrieval, letting unauthorized chunks influence ranking or the prompt

Verified user identity, tenant partitioning

1. Pre-filter: Apply tenant and ACL scope inside the search request itself, before ranking

2. Partition query: Query only the caller's namespace, shard, or collection

3. Guardrails: Reject retrieval paths that cannot enforce hard pre-filters

High

Scope integrity

Client-supplied scope: Caller sends tenant_id, group_ids, or namespace names in the request body to widen retrieval beyond their grants

Verified caller identity

1. Server-side derivation: Build scope only from the verified token and policy store

2. Discard client principals: Ignore or reject client-supplied ACL fields

3. Audit: Log requested scope separately from effective scope for drift detection

High

Retrieval authorization

Delegated identity collapse: Backend service calls the RAG gateway using its own service token, and retrieval is authorized as the service instead of the end user, bypassing user-level ACLs

Verified caller identity

1. Require delegated token: User-scoped retrieval requires an end-user token or explicit token exchange

2. Bind to end-user subject: Retrieval decisions reference the delegated user, not the calling service account

3. Reject generic callers: Block user-scoped queries from service identities without delegation context

High

Availability

Fail-open fallback: Policy resolver, IdP, or filter path is slow or unavailable, and the service retries with an unfiltered search to keep answers flowing

Policy evaluation required

1. Fail closed: Return an error when effective scope cannot be resolved

2. Bounded cache: Cache short-lived policy snapshots per principal, only if bounded and auditable

3. Chaos tests: Prove that policy-store outage does not widen retrieval scope

High

Permission cache

Cache staleness: Cached group memberships or policy snapshots outlive a revocation, granting access after it was removed

Policy cache in use

1. Short TTLs: Cache policy snapshots for minutes, not hours

2. Invalidation events: Subscribe to revocation signals to bust cache entries early

3. Freshness header: Require cache entries to carry a max-age and fail closed when expired

Medium

Query privacy

Side-channel probing: Caller probes the corpus with repeated queries and infers document existence from score changes, hit counts, or denial wording

Opaque document IDs

1. Uniform responses: Do not reveal whether inaccessible documents exist

2. Suppress telemetry: No global hit counts outside the authorized scope

3. Rate limits: Detect repeated probing across document names or project terms

Medium

Phase 3: Prompt assembly and response

Focus: Keeping unauthorized context out of the answer path

Asset

Threat

Baseline Controls

Mitigation Options

Risk

Session context

Context bleed: Previously retrieved content leaks across users, tenants, or workspace switches because session context is reused without clearing

Authenticated sessions

1. Scope binding: Bind conversation state to principal and tenant

2. Context reset: Clear retrieved chunks on principal or workspace change

3. Isolation tests: Verify that switching context produces zero carryover from prior scope

High

Response integrity

Citation mismatch: Model answers with claims not grounded in approved chunks, or cites documents outside the filtered retrieval set

Citations enabled

1. Citation binding: Return only citations from retrieved chunk IDs

2. Post-check: Reject or downgrade answers referencing documents outside the approved set

3. Safe refusal: Prefer "I don't have enough information" over speculative synthesis

Medium

Audit trail

Silent leakage: Service cannot reconstruct which chunks reached the model, so an access incident cannot be investigated

Request logs

1. Retrieval manifest: Record request_id, principal, effective scope, and retrieved chunk IDs per request

2. Prompt manifest: Persist the final approved context list

3. Alerts: Trigger on cross-tenant retrieval attempts or sudden spikes in denied queries

Low

If you rely on post-retrieval trimming instead

If retrieval searches the whole corpus and your application removes unauthorized hits after ranking, the threat profile shifts:

  • Unauthorized chunks influence ranking and similarity scores before they're dropped

  • Blocked results occupy candidate slots, making top-k recall unpredictable

  • Caching layers may store broad search results that later callers shouldn't see

  • Debugging pressure creates dangerous shortcuts: teams start requesting raw unfiltered hits

Post-retrieval trimming isn't a safe default. Use it only as an additional check after hard retrieval boundaries, not instead of them.

Verification checklist

  • Identity and scope

    • A request with client-supplied tenant_id, group_ids, namespace names, or raw ACL filters is ignored or rejected

    • A backend service calling the RAG gateway without delegated end-user context can't perform user-scoped retrieval

    • If the IdP or policy resolver is unavailable, the request fails closed and no unfiltered retrieval occurs

    • A request with the same query but a narrower delegated identity returns a strictly narrower set of approved chunks, even when sent through the same backend service

  • Ingest and metadata

    • Ingesting a document without ACL metadata fails; no searchable chunks are written

    • A chunk missing tenant_id, doc_id, chunk_id, ACL metadata, version, or state is rejected by ingest validation

    • A mixed-tenant source payload is rejected and doesn't produce a merged chunk

    • Restricted fields are redacted or split before chunking and embedding

    • A connector whose service account gains read access to an unapproved source space doesn't index that content until it appears on the allow-list

  • Revocation and deletion

    • Removing a user from a source group blocks retrieval within the documented freshness SLA

    • Deleting or revoking a document marks affected chunks non-retrievable before background cleanup completes

    • If reindex is pending after an ACL change, affected content is excluded from retrieval until new metadata is active

  • Retrieval enforcement

    • A caller querying with Tenant A credentials for a document known to exist in Tenant B receives zero chunks, zero hit counts, and no signal distinguishing "does not exist" from "not authorized"

    • Retrieval executes inside the authorized partition or hard filter before ranking returns results

    • Post-filter-only retrieval paths are disabled or blocked in production

  • Prompt assembly and response

    • Switching tenant, workspace, or principal clears prior retrieval context from the session

    • Every citation in the answer maps to a chunk returned by the filtered retrieval step

    • Answers referencing documents outside the approved set are rejected or downgraded to a safe refusal

  • Logging and detection

    • Audit logs record request_id, principal, effective scope, retrieved chunk IDs, cited document IDs, and denial events

    • Alerts fire on repeated scope-probing attempts, cross-tenant retrieval attempts, or sudden spikes in denied queries

    • Post-hoc audit can reconstruct which chunks contributed to any historical response

Implementation & Review

The full threat model matrix, architectural diagrams, and a printable verification checklist for this pattern are available in the Secure Patterns repository. Use these artifacts to guide your design reviews and internal audits.

Keep reading