Preventing Data Leaks in AI Agents: How to Scope Content Access Without Prompt Engineering
Telling your agent DO NOT access draft content in the system prompt is not security. Architectural access controls that physically prevent the agent from seeing unauthorized data are.
Every team building production AI agents faces the same governance question: how do you prevent the agent from accessing content it should not see?
Draft product announcements, internal pricing strategies, unreleased feature documentation, HR policies, legal negotiationsâyour Content Lake contains all of it, and your customer-facing agent must never touch it.
The instinct is to solve this with prompt engineering. You write instructions like:
- âNever reference draft content.â
- âOnly use published product information.â
This feels like a solution until a user crafts a prompt that tricks the agent into ignoring its instructions. Prompt injection is a well-documented attack vector, and no amount of system prompt hardening can guarantee that an LLM will always follow its rules.
Real agent security requires architectural access controls that operate at the data layer, not the prompt layer. A Content Operating System with scoped API access and infrastructure-level filtering ensures that unauthorized content physically cannot reach the agentâregardless of what the user asks.
Why Prompt-Based Security Fails
Language models are designed to be helpful and follow instructions. Unfortunately, they can also be instructed to ignore previous instructions.
When you connect AI agents to your content, youâre not just giving them knowledgeâyouâre giving them power. Unless you strictly control what they can see, they can surface anything in scope to anyone who asks. That includes:
- Internal pricing formulas
- Draft product announcements
- Confidential HR policies
If your scope is too broad, or enforced only via prompts, your customer-facing bot can become an accidental data exfiltration tool. This is already happening in enterprises that wired agents directly into their content without proper access governance.
The fix is not to limit what users can ask. Itâs to limit what the agent can see.
A Content Operating System like Sanity, combined with Agent Context, makes this an architectural guarantee instead of a hopeful prompt instruction.
Why Prompt-Based Access Control Fails
Most teams start with prompt-based governance:
âDo not share draft content.â
âNever reveal internal pricing.â
This fails for two core reasons:
- LLMs donât enforce rules deterministically
- Context drift in long conversations
If your only line of defense is âplease donât show X,â youâre relying on a system that is fundamentally not designed to enforce hard boundaries.
Infrastructure-Level Scoping With Agent Context
Sanityâs Agent Context flips the model:
- Instead of telling the agent what not to share,
- You define what it is allowed to see at the infrastructure level.
Each Agent Context document contains a GROQ filter that defines the agentâs content scope. That filter runs on the server before any content is sent to the model.
Example: a customer-facing support bot might be scoped to:
*[_type in [âsupportArticleâ, âfaqâ] && _id in path(âdrafts.**â) == false && status == âpublishedâ]
Within this context:
- Drafts are never fetched
- Internal docs are never fetched
- Pricing data is never fetched
The agent physically cannot access those documents, because theyâre excluded from its view before any query executes. No prompt trick can retrieve data that never reaches the model.
Scoping Patterns for Common Use Cases
Different agents need different slices of the same Content Lake. Agent Context lets you define these slices precisely with GROQ filters.
1. Customer Support Bot
Goal: Answer customer questions using only public, relevant support content.
Scope pattern:
- Only
supportArticleandfaqtypes - Only
publisheddocuments
Why Prompts Cannot Enforce Content Boundaries
Agent Context GROQ Filter for Governed Access
This GROQ filter physically scopes a customer-facing agent to published, non-internal content in the user's region. No prompt injection can override this boundary.
// Agent Context GROQ filter for customer-facing agent
// Only published, non-internal content visible
*[
_type in ["product", "faq", "helpArticle"] &&
!(_id in path("drafts.**")) &&
visibility != "internal" &&
region == $userRegion
]Architectural access control is the only reliable way to prevent AI agents from leaking sensitive content. Prompt instructions like âdo not access draftsâ are policies, not enforcement. They can be ignored, jailbroken, or lost in long contexts.
Sanityâs Agent Context addresses this by scoping what the agent can physically query:
- GROQ filters restrict documents at the API level (e.g.
_type in ["product", "faq", "documentation"] && public == true), so drafts, internal data, or private docs never enter the agentâs context. - Dataset scoping ensures agents only connect to the correct dataset (e.g. production vs staging), preventing accidental exposure of staging or experimental content.
- Read-only access guarantees agents cannot create, update, or delete documents, eliminating the risk of unintended content changes.
This architecture directly mitigates common leak scenarios:
- Draft content exposure: Drafts are excluded by filters and perspectives, so unannounced features stay hidden.
- Internal pricing leaks: Internal cost or margin fields can be modeled separately and excluded via GROQ filters.
- Cross-tenant access: Tenant-specific filters (e.g.
_type == "product" && tenantId == "tenant-123") ensure one tenantâs agent cannot see another tenantâs data.
Architectural access control is the only reliable way to prevent AI agents from leaking sensitive content. Prompt instructions like âdo not access draftsâ are guidance, not security boundaries, and can fail under jailbreaks, adversarial prompts, or long-context confusion.