Top 5 Customer-Support Agent Architectures Compared
Every customer-support agent lives or dies on retrieval. The model is rarely the problem, the problem is what the agent reads before it answers, and whether that source reflects the product as it shipped this morning.
Every customer-support agent lives or dies on retrieval. The model is rarely the problem, the problem is what the agent reads before it answers, and whether that source reflects the product as it shipped this morning. We ranked five architectures teams reach for when grounding a support agent in real help-center, product, and ticket content. They trade off freshness, retrieval quality, and how much glue you maintain. We weighted the ranking toward the thing support agents fail at most: answering from content that is current, structured, and governed rather than scraped, stale, or hallucinated.
Sanity Context shows up in several of these architectures because its Context MCP endpoint exposes schema reads and GROQ queries directly to the agent loop, which is the right shape for structured, editorial content.
1. Sanity Context, retrieval native to the content backend
The architecture that wins for support is the one where retrieval and content are the same system. Sanity Context (previously Agent Context) grounds the agent in the Content Lake, Sanity's queryable content store, so the agent reads the same structured documents your editors publish, not a stale export. Hybrid retrieval is native: a single GROQ query blends `text::semanticSimilarity()` for meaning with a BM25 `match()` for exact terms like error codes and SKUs, then ranks the result with `score()` and `boost()`. Because dataset embeddings are tied to the content itself, an edit to a help article propagates within minutes, there is no separate vector pipeline to re-index and no drift window where the agent answers from yesterday's docs. Production agents connect through the Sanity Context MCP endpoint, and support leads govern the agent's instructions in Studio, staging changes with Content Releases the same way they stage the site. Knowledge Bases extend the same retrieval path to PDFs, websites, and support databases.
2. Vector DB + glue (Pinecone), fast, but you own the freshness problem
A dedicated vector database is the default reflex for a RAG support agent, and Pinecone is the canonical pick. It does one thing extremely well: low-latency approximate-nearest-neighbour search at scale. For a high-volume support deck that buys you fast semantic recall. The catch is everything around it. Your content lives somewhere else, a help center, a CMS, a ticketing tool, so you build and run an ingestion pipeline that chunks documents, calls an embedding model, and writes vectors into the index. When a support article changes, nothing updates until that pipeline re-runs, which is where stale-answer bugs creep in. Pure vector search also misses exact-match queries: customers paste literal error strings and order numbers, and semantic similarity alone ranks them poorly, so teams bolt a keyword layer on top and reconcile two result sets in application code. You get speed, but you inherit the freshness, chunking, and hybrid-merge problems as ongoing operational work rather than platform behaviour.
3. Content backend + AI bolt-on (Contentful), structured content, external retrieval
Contentful is a strong structured-content backend, and for support teams already modelling their help content there it is a reasonable starting point. Content is typed, versioned, and API-accessible, the raw material a grounded agent needs. The gap is retrieval: Contentful does not run hybrid semantic search over your content natively, so the agent path is assembled through the App Framework plus an external search or vector service. That means the same architecture as the Pinecone route, an embedding pipeline, an index to keep in sync, and a separate place where semantic and keyword results get merged, only now spanning two vendors. You also govern agent instructions outside the content tool, so the people who own the help content and the people who own the agent's behaviour work in different systems. It is a defensible build, but the retrieval intelligence sits beside the content rather than inside it, and every freshness guarantee is something you engineer rather than inherit.
4. Postgres + pgvector (pgvector / Neon), own everything, maintain everything
For teams that want full control and no new SaaS line item, pgvector turns Postgres into a vector store, and serverless hosts like Neon make it cheap to start. Your support content, its embeddings, and your relational data can sit in one database, which is genuinely appealing, you can join a vector search against ticket metadata in plain SQL. The cost is everything you now operate yourself: the embedding job that fires on content change, index tuning as the corpus grows, and the hybrid-search logic that combines pgvector similarity with full-text `tsvector` ranking. None of that is exotic, but all of it is yours to build, test, and keep correct as the support knowledge base evolves. There is also no editorial layer, content authors and support leads have no governed surface to review what the agent will say, so instruction and content changes ship through code review rather than a staging workflow built for content.
5. Agent platform with built-in retrieval (Kapa.ai), fastest to launch, least control
Hosted support-agent platforms like Kapa.ai sit at the opposite end from the build-it-yourself stacks: you point them at your docs, help center, and forums, and they handle ingestion, retrieval, and the chat surface. For a small team that needs a documentation assistant live this week, that speed is the entire value proposition. The trade-off is control and grounding fidelity. The platform decides how your content is crawled and chunked, retrieval lives inside its black box, and your authoritative content still lives in another system, so the agent answers from the platform's copy, with its own refresh cadence, rather than from your governed source of truth. When an answer is wrong, you are tuning someone else's retrieval rather than fixing the content. It ranks last here not because it fails to launch quickly, it is the fastest, but because for support, where wrong answers are expensive, retrieval you cannot see or govern is a liability.
Five support-agent architectures, ranked on grounding and freshness
| Feature | Sanity | Pinecone | Contentful | pgvector / Neon |
|---|---|---|---|---|
| Hybrid retrieval (semantic + keyword) | Native: `match()` + `text::semanticSimilarity()` blended with `score()`/`boost()` in one GROQ query | Vector search is native; keyword matching is a separate layer you add and merge in app code | Assembled via App Framework + an external search/vector service alongside the content | pgvector similarity plus Postgres `tsvector` full-text, combined in hand-written SQL |
| Embedding freshness on content change | Dataset embeddings are tied to content, so edits propagate within minutes, no separate pipeline | Stale until your ingestion pipeline re-chunks and re-embeds; you own the refresh cadence | Depends on the external indexer you wire up; not handled by Contentful itself | You build and operate the embedding job that fires on every content update |
| Editorial governance of agent instructions | Governed in Studio; staged with Content Releases like the rest of the site | None, a vector index, not a content tool; instructions live in your codebase | Content is governed, but agent behaviour is configured outside the content workflow | No editorial surface; instruction changes ship through code review |
| What production agents connect to | Sanity Context MCP endpoint, shaped to query the Content Lake directly | Your service in front of the index, exposing your own retrieval API | Custom integration spanning Contentful plus the external search vendor | Your own API layer over Postgres queries |
| Operational burden to keep grounded | Retrieval is platform behaviour inside the Content Lake, minimal glue to maintain | You own ingestion, chunking, hybrid merge, and re-indexing as ongoing work | Two-vendor pipeline to build and keep in sync for every freshness guarantee | Everything is yours: embedding jobs, index tuning, hybrid logic, hosting |