The RAG Chatbot Dilemma: Should You Respect 3rd-Party Permissions or Sync to Your Own System?

As more teams build RAG-based LLM applications — like chatbots that answer questions over a user’s Google Drive, Notion workspace, or Jira instance — one deceptively simple question keeps surfacing.

“How do we keep this thing from leaking data?”

It came up in a recent engineering thread, and I’ve seen it surface again and again:

Press enter or click to view image in full size

If you’re working on secure AI systems that interact with third-party data, this post walks through the core architecture patterns and trade-offs you’ll need to consider.

The Problem: Filtering Retrieved Data Based on Permissions

Say you’re building a chatbot that can summarize and answer questions over a user’s Google Drive. You’re converting the content to embeddings in a vector database and retrieving similar documents using a similarity search. Alice uses your chatbot to ask a question:

“How does the company determine raises?”

Now you need to make sure the documents the LLM sees are ones Alice is allowed to access. You want to show her the annual review process, how company and individual performance affect raises, and the evaluation criteria that apply to her role. You definitely don’t want to show her everyone’s raises for the last three years. But your chatbot can see all of that information.

Do you:

Use Google’s APIs to filter results at query time, iteratively?
Sync Google Drive permissions into the same place where the similarity search is happening (e.g., a vector db)?
Try to mirror Google’s entire permissions logic in your own policy language or rule engine, and sync only the relevant metadata? (e.g., File:1 belongs to Folder:2 belongs to Drive:3)?

Let’s consider the options.

Option 1: Query-Time Filtering Using the Third-Party API

This approach is appealing because it avoids data syncing. First you retrieve similar files from your vector DB, then you call Google’s files.list endpoint to get all files a user is authorized to view, and finally you intersect the two. Simple in theory:

Press enter or click to view image in full size

Pros:

No syncing or duplication
Simple to implement (if the API is robust)

Cons:

Catastrophic latency at scale (slow for thousands of files)
No “one-shot” queries (you’re stuck iterating)
Only viable if the permissions API exists and is fast, centralized, and expressive (rare)

This works okay for prototypes or low-scale use cases where high latency is acceptable (e.g., the user waits a few seconds for a chatbot reply). But if you have to authorize hundreds of vectors this way, that latency adds up. When “a few seconds” starts to get closer to “30 seconds” or “60 seconds,” your users may be less forgiving.

One engineer I spoke with put it bluntly: “A reason we don’t do that approach for general authorization queries is that authorization is typically on the critical request path, and any authorization latency directly leads to request latency.”

What about extending beyond Google to other third-party stores? Not every third-party store has well defined permissions logic and APIs like GDrive. You might not even be able to get the permissions you need for your filter.

Option 2: Sync ACLs into Your Vector DB

In this approach, you sync every ACL from Google into your vector DB as part of your ETL pipeline. Then you can filter the results locally using metadata during similarity search.

This means:

You’re syncing the actual access control list (ACL) (e.g., which users can access which documents)
You’re putting it directly into the vector database (or wherever similarity search runs)
When results are retrieved, you’re filtering them locally using those ACLs

This keeps both the data and permission metadata in the same place — fast, but might not scale well for large ACLs.

Pros:

Fast queries with no round-trips
All permissions logic handled locally

Cons:

Massive sync burden, especially for large organizations
Many third-party APIs don’t expose ACLs
ACLs are often incomplete (“group X has access”, now go fetch who’s in group X)
Doesn’t scale well, especially when ACLs are massive (i.e., when almost but not everyone can view a particular file, folder, etc.)

This is impractical for many real-world third-party integrations. Even if you theoretically were able to limit the ACL size (e.g. filtering for particular sets of users or something) it’s still difficult to do today because:

Many integrations simply don’t have permissions APIs (e.g. Notion — surprising, right?)
Many integrations with permissions APIs don’t expose one that offers a centralized view of both logic + data. For example, the ACL they give you might say ‘this “group” has “read” access’ or ‘this has given you “read” access to the entire folder’ but you need to figure out who’s in that group and what’s in that folder

So you are stuck guessing who has access to what.

Option 3: Mirror Permissions Logic in Your System

In this model, you write policies (e.g., using Oso’s Polar language) that mimic the logic of the third-party system. Then you sync only the authorization-relevant metadata — folders, groups, ownership chains, etc. — into your own system and run permission checks there.

Pros:

Small, selective sync (just the required subset of authorization-relevant data)
Fast, local enforcement
Logic is auditable, testable, observable

Cons:

Hard to reverse-engineer third-party logic
You now own maintaining a copy of their logic
Third-party systems evolve and you have to keep up
Still doesn’t solve for all edge cases

This is the approach I’ve seen more engineering teams moving toward, especially as the number of data sources grows. But it’s also the most work, and assumes you’ve already solved for identity resolution across systems.

The Elephant in the Room: Who’s Asking?

Permissions don’t mean much if you don’t know who you’re checking them for. Third-party systems all use their own user IDs, group structures, and naming conventions. Mapping those consistently into your system is a prerequisite to enforcing permissions correctly.

Authentication vs. Authorization? In theory, separate. In practice, inseparable.

So What Should You Do?

There’s no universal answer, but here’s a quick framework based on what I’ve seen:

Press enter or click to view image in full size

What Comes Next?

Authorization for AI isn’t just about access control, it’s about correctness. It’s about surfacing the right information for the right user at the right time.

As third-party RAG integrations become more common, some teams are rebuilding and centralizing permission logic; others are lobbying for better APIs. It’s still early days, and the right answer varies by stack.

Platforms like Perplexity are already talking about the next step: Actions.

LLMs that don’t just read but also do. Think: “approve this request,” “send this invoice,” “delete this calendar event.” When that happens, access control becomes authorization and intent validation.

Curious how others are solving this? I’d love to hear what you’re trying, whether it’s custom ETLs, homegrown policy engines, or something totally different.