Source systems have permissions. Vector indexes have data and metadata. If you don't carry authorization into ingestion and retrieval, the index becomes a second copy of your documents with weaker access control. In a RAG system, retrieval is the first authorization decision, not a relevance-only step. If retrieval runs before tenant, document, or field-level checks, the model can summarize data the caller was never allowed to see. This post documents a safe default for preserving source permissions through the retrieval layer and its trade-offs.
System description
A RAG access control system resolves caller identity, derives an authorization scope from verified credentials and policy, retrieves only chunks bound to that scope, and sends only approved chunks to the model. The source document system is the authority for permissions and lifecycle state. The retrieval index is a projection, not the source of truth.

RAG Architecture
Architecture choice
You can enforce source permissions in retrieval two ways. The security trade-offs change depending on which you pick.
Partition first
Each tenant, workspace, or classification tier gets a separate namespace, shard, or collection. Queries still enforce caller scope, but the first boundary is physical or logical partitioning.
Use this when:
Tenant isolation matters more than cross-tenant recall
You need lower blast radius from application bugs
Index size or workload patterns already justify separate partitions
Trade-off: Partitioning simplifies isolation but complicates index management, rebalancing, and shared-content workflows.
Filter within partition
Each query carries an authorization scope derived server-side, and retrieval applies it as a hard filter before ranking returns chunks. Document-level or chunk-level ACL metadata controls who sees what inside a partition.
Use this when:
Permissions are fine-grained (per-document or per-folder)
Users have overlapping access across many document collections
You need permission changes to propagate without re-partitioning the store
Main risks: Stale ACL metadata, weak scope derivation, and filter bypass in fallback paths.
Common middle ground: Partition by tenant or workspace first, then apply document-level ACL filters within that partition. Per-chunk ACL filters do not replace a tenant boundary.
Golden path
Build this first. Then relax constraints only if you have a specific reason:
Authenticate caller → derive principal server-side → resolve retrieval scope → query authorized partition → verify chunk freshness → send approved chunks to model → bind citations to chunk IDs → log scope and denialsEach step is a gate. If any gate cannot resolve (identity unavailable, scope ambiguous, freshness unverifiable), fail closed. Failing closed raises the denial rate during outages and reindex lag, but that cost is lower than silently widening retrieval scope.
Core design
RAG gateway (control plane)
Authenticates the caller and derives principal and group membership from the identity provider. From there, it builds the retrieval scope using the policy store and decides which chunks may reach the model. The gateway is the only component that constructs vector queries.
Maps verified identity to allowed tenants, workspaces, document classes, and optional field restrictions. The resolver derives scope, not the client request.
Retrieval index (data plane)
Stores chunk embeddings alongside security metadata. The index doesn't enforce application-level access control on its own, but it supports namespace partitioning and metadata filtering.
Minimum chunk metadata: tenant_id, doc_id, chunk_id, acl_principals or acl_groups, classification (public, internal, restricted), version (source document version at chunking time), state (active, deleted, revoked, pending_reindex).
Don't derive authorization from chunk text. The decision must come from structured metadata.
Source document store (system of record)
Canonical documents, ACLs, lifecycle state, and redaction state live here. The index is always a derived copy. When there's a conflict between source and index, the source wins.
Delegated identity for service-to-service callers
Backend services calling the RAG gateway with a machine identity must propagate end-user context. Authorizing the calling service isn't the same as authorizing the end user. Reject user-scoped queries from service identities that carry no delegation context.
ACL propagation and revocation
ACL changes, document deletion, and tenant offboarding must propagate to the index. The safe default is fail closed while reindex is pending: mark affected chunks as pending_reindex and exclude them from retrieval before background cleanup catches up.
Redaction
Chunk-level access control isn't enough if the chunk itself mixes allowed business text with restricted fields. Redact secret and restricted fields before embedding, or split them into separate chunks with appropriate classification metadata.
Session isolation
Bind session state to the current principal and tenant. Clear retrieved context on principal or workspace change.
Threat model
Baseline assumptions
Callers are untrusted: they can craft arbitrary prompts, retry requests, and probe for data outside their scope
Control plane authority: the RAG gateway derives user, tenant, and group context from verified identity, not from request body fields
The source document system is the authority for ACLs and lifecycle state; the retrieval index is a derived copy
The vector store supports hard filters, namespaces, or tenant-aware collections
The LLM does not understand permissions and will use any content in its context window
Standard infra controls such as TLS, WAF, secret management, and database AuthN are assumed to be in place. This model focuses on RAG retrieval authorization
A note on risk: you won’t fix everything
This table isn’t a checklist where every row must be fully eliminated. Focus on preventing the worst failures and limiting blast radius. In practice: ship prevention for the High rows first, then add monitoring and response for what you can’t realistically prevent.
Phase 1: Ingest and indexing
Focus: Preserving the source system's access boundary in retrieval metadata
Asset | Threat | Baseline Controls | Mitigation Options | Risk |
|---|---|---|---|---|
Retrieval boundary | ACL drop: Ingest pipeline chunks and embeds a document but omits tenant or ACL metadata, leaving chunks globally searchable inside the index | Source ACL extraction | 1. Required fields: Reject indexing if tenant_id or ACL metadata is missing 2. Schema validation: Validate chunk metadata in CI and at ingest time 3. Dead-letter queue: Hold malformed documents out of the searchable index | High |
Authorization freshness | Stale grants: User loses access in the source system, but old ACL metadata in the index keeps returning those chunks | Periodic reindex cycle | 1. Event-driven sync: Subscribe to ACL-change events where the source supports them 2. Differential polling: High-frequency comparison with a defined freshness SLA where events are unavailable 3. Fail closed: Mark affected documents pending_reindex and exclude from retrieval | High |
Tenant isolation | Mixed-tenant chunks: Chunker or ETL job merges data from different tenants into one chunk or document record | Tenant ID in chunk metadata | 1. Partition first: Chunk only within a single tenant or workspace boundary 2. Deterministic IDs: Carry tenant_id through every transform stage 3. Negative tests: Assert no chunk contains multiple tenant identifiers in source provenance | High |
Source collection scope | Over-collection: Connector service identity can read or ingest document spaces that were never approved for retrieval, because the service account has overly broad source visibility or the sync scope is misconfigured | Source ACL extraction, connector auth | 1. Scope connector identities: Limit connector service accounts to approved corpora only 2. Allow-list indexed sources: Maintain an explicit allow-list of indexed sources, spaces, or folders 3. Audit before first ingest: Review newly discovered sources before they enter the index | Medium |
Deleted content | Zombie chunks: Source document is deleted or revoked, but old chunks remain queryable until the next batch rebuild | Document lifecycle state | 1. Tombstones: Push delete / revoke events into the index immediately, marking chunks as non-retrievable 2. State filter: Exclude deleted, revoked, and pending_reindex chunks at query time 3. Short rebuild windows: Keep batch lag within your documented freshness SLA | Medium |
Field-level secrets | Chunk overexposure: A chunk contains both allowed business text and restricted fields because chunking happened before redaction | Classification metadata | 1. Redact before embedding: Remove secret and restricted fields upstream of the chunker 2. Sensitivity-aware chunking: Split restricted fields into separate chunks with appropriate classification 3. Classification filters: Exclude restricted classes for broad roles | Medium |
Phase 2: Query-time retrieval
Focus: Preventing unauthorized chunks from entering the prompt
Asset | Threat | Baseline Controls | Mitigation Options | Risk |
|---|---|---|---|---|
Tenant data | Filter bypass: Service runs semantic search across the whole corpus, then applies ACL checks after top-k retrieval, letting unauthorized chunks influence ranking or the prompt | Verified user identity, tenant partitioning | 1. Pre-filter: Apply tenant and ACL scope inside the search request itself, before ranking 2. Partition query: Query only the caller's namespace, shard, or collection 3. Guardrails: Reject retrieval paths that cannot enforce hard pre-filters | High |
Scope integrity | Client-supplied scope: Caller sends tenant_id, group_ids, or namespace names in the request body to widen retrieval beyond their grants | Verified caller identity | 1. Server-side derivation: Build scope only from the verified token and policy store 2. Discard client principals: Ignore or reject client-supplied ACL fields 3. Audit: Log requested scope separately from effective scope for drift detection | High |
Retrieval authorization | Delegated identity collapse: Backend service calls the RAG gateway using its own service token, and retrieval is authorized as the service instead of the end user, bypassing user-level ACLs | Verified caller identity | 1. Require delegated token: User-scoped retrieval requires an end-user token or explicit token exchange 2. Bind to end-user subject: Retrieval decisions reference the delegated user, not the calling service account 3. Reject generic callers: Block user-scoped queries from service identities without delegation context | High |
Availability | Fail-open fallback: Policy resolver, IdP, or filter path is slow or unavailable, and the service retries with an unfiltered search to keep answers flowing | Policy evaluation required | 1. Fail closed: Return an error when effective scope cannot be resolved 2. Bounded cache: Cache short-lived policy snapshots per principal, only if bounded and auditable 3. Chaos tests: Prove that policy-store outage does not widen retrieval scope | High |
Permission cache | Cache staleness: Cached group memberships or policy snapshots outlive a revocation, granting access after it was removed | Policy cache in use | 1. Short TTLs: Cache policy snapshots for minutes, not hours 2. Invalidation events: Subscribe to revocation signals to bust cache entries early 3. Freshness header: Require cache entries to carry a max-age and fail closed when expired | Medium |
Query privacy | Side-channel probing: Caller probes the corpus with repeated queries and infers document existence from score changes, hit counts, or denial wording | Opaque document IDs | 1. Uniform responses: Do not reveal whether inaccessible documents exist 2. Suppress telemetry: No global hit counts outside the authorized scope 3. Rate limits: Detect repeated probing across document names or project terms | Medium |
Phase 3: Prompt assembly and response
Focus: Keeping unauthorized context out of the answer path
Asset | Threat | Baseline Controls | Mitigation Options | Risk |
|---|---|---|---|---|
Session context | Context bleed: Previously retrieved content leaks across users, tenants, or workspace switches because session context is reused without clearing | Authenticated sessions | 1. Scope binding: Bind conversation state to principal and tenant 2. Context reset: Clear retrieved chunks on principal or workspace change 3. Isolation tests: Verify that switching context produces zero carryover from prior scope | High |
Response integrity | Citation mismatch: Model answers with claims not grounded in approved chunks, or cites documents outside the filtered retrieval set | Citations enabled | 1. Citation binding: Return only citations from retrieved chunk IDs 2. Post-check: Reject or downgrade answers referencing documents outside the approved set 3. Safe refusal: Prefer "I don't have enough information" over speculative synthesis | Medium |
Audit trail | Silent leakage: Service cannot reconstruct which chunks reached the model, so an access incident cannot be investigated | Request logs | 1. Retrieval manifest: Record request_id, principal, effective scope, and retrieved chunk IDs per request 2. Prompt manifest: Persist the final approved context list 3. Alerts: Trigger on cross-tenant retrieval attempts or sudden spikes in denied queries | Low |
If you rely on post-retrieval trimming instead
If retrieval searches the whole corpus and your application removes unauthorized hits after ranking, the threat profile shifts:
Unauthorized chunks influence ranking and similarity scores before they're dropped
Blocked results occupy candidate slots, making top-k recall unpredictable
Caching layers may store broad search results that later callers shouldn't see
Debugging pressure creates dangerous shortcuts: teams start requesting raw unfiltered hits
Post-retrieval trimming isn't a safe default. Use it only as an additional check after hard retrieval boundaries, not instead of them.
Verification checklist
Identity and scope
A request with client-supplied tenant_id, group_ids, namespace names, or raw ACL filters is ignored or rejected
A backend service calling the RAG gateway without delegated end-user context can't perform user-scoped retrieval
If the IdP or policy resolver is unavailable, the request fails closed and no unfiltered retrieval occurs
A request with the same query but a narrower delegated identity returns a strictly narrower set of approved chunks, even when sent through the same backend service
Ingest and metadata
Ingesting a document without ACL metadata fails; no searchable chunks are written
A chunk missing tenant_id, doc_id, chunk_id, ACL metadata, version, or state is rejected by ingest validation
A mixed-tenant source payload is rejected and doesn't produce a merged chunk
Restricted fields are redacted or split before chunking and embedding
A connector whose service account gains read access to an unapproved source space doesn't index that content until it appears on the allow-list
Revocation and deletion
Removing a user from a source group blocks retrieval within the documented freshness SLA
Deleting or revoking a document marks affected chunks non-retrievable before background cleanup completes
If reindex is pending after an ACL change, affected content is excluded from retrieval until new metadata is active
Retrieval enforcement
A caller querying with Tenant A credentials for a document known to exist in Tenant B receives zero chunks, zero hit counts, and no signal distinguishing "does not exist" from "not authorized"
Retrieval executes inside the authorized partition or hard filter before ranking returns results
Post-filter-only retrieval paths are disabled or blocked in production
Prompt assembly and response
Switching tenant, workspace, or principal clears prior retrieval context from the session
Every citation in the answer maps to a chunk returned by the filtered retrieval step
Answers referencing documents outside the approved set are rejected or downgraded to a safe refusal
Logging and detection
Audit logs record request_id, principal, effective scope, retrieved chunk IDs, cited document IDs, and denial events
Alerts fire on repeated scope-probing attempts, cross-tenant retrieval attempts, or sudden spikes in denied queries
Post-hoc audit can reconstruct which chunks contributed to any historical response
Implementation & Review
The full threat model matrix, architectural diagrams, and a printable verification checklist for this pattern are available in the Secure Patterns repository. Use these artifacts to guide your design reviews and internal audits.
