Your RAG Pipeline Is a Product Decision, Not an Engineering One

TL;DR

RAG architecture choices (what to index, how to chunk, when to retrieve, what to cite) are product decisions with direct UX and trust implications
Most product teams delegate RAG entirely to engineering, then wonder why the AI feature hallucinates or returns irrelevant answers
A product manager who understands retrieval tradeoffs will build a fundamentally better AI product than one who treats RAG as a black box

Every AI product team has had this conversation.

The feature launches. The AI assistant answers questions from the company's knowledge base. In the demo, it's brilliant. In production, a customer asks something slightly outside the training data and the model confidently fabricates an answer that sounds authoritative and is completely wrong.

The PM says: "Why did it hallucinate?" The engineer says: "The retrieval didn't find a relevant chunk." The PM says: "Can you fix the retrieval?" The engineer says: "What do you want me to optimise for?"

Silence. Because the PM doesn't know. They specified "build a RAG-powered assistant" and assumed the rest was implementation detail.

It isn't. RAG architecture is a stack of product decisions disguised as engineering choices. Every layer of the pipeline (what you index, how you chunk it, how you retrieve, how you rank, how you present sources) has direct implications for user experience, trust, accuracy, cost, and latency. If the product manager isn't making those decisions, nobody is. And the AI feature will reflect that absence.

What product decisions are hidden in a RAG pipeline?

Let me walk through the pipeline and show where the product decisions hide.

What to index

This is content strategy, not database administration.

Most teams start by indexing everything. Every document, every wiki page, every Slack message, every support ticket. The assumption is that more data means better answers.

It doesn't. More data means more retrieval noise. If your index contains outdated policies, draft documents, internal debates, and contradictory information, the retrieval system will surface contradictions. The model will try to reconcile them. Sometimes it will succeed. Sometimes it will average two contradictory answers into one confidently wrong answer.

The product decision: What is the authoritative source of truth for each type of question? If a customer asks about your return policy, should the retrieval hit the current policy document, last year's policy document, the Slack thread where someone debated changing the policy, and the support ticket where an agent made a one-time exception? Or should it hit only the current policy document?

Curating your index is a product decision about what counts as ground truth. Most teams skip it. Then they spend months debugging hallucinations that are actually retrieval pollution.

How to chunk

Chunking (how you split documents into retrievable segments) determines the granularity of your AI's knowledge. It's also where most teams make their first costly mistake.

Too small (100 to 200 tokens): You get precise retrieval but lose context. The chunk contains the answer but not the surrounding context that makes the answer meaningful. The model fills in the context from its own training data, which may be wrong for your specific domain.

Too large (2,000+ tokens): You get rich context but diluted relevance. The chunk contains the answer somewhere in a wall of text, along with a lot of irrelevant information that confuses the ranking and burns tokens.

The sweet spot depends on the content type. FAQs chunk well at 200 to 400 tokens (one question-answer pair per chunk). Technical documentation chunks well at 500 to 800 tokens (one concept per chunk with enough context). Long-form policy documents need hierarchical chunking: section-level chunks for broad retrieval, paragraph-level chunks for precise answers.

This is a product decision because it determines the character of your AI's responses. Small chunks produce terse, precise answers. Large chunks produce contextually rich but sometimes unfocused answers. The right choice depends on what your users need, and that's a product question, not an engineering one.

Document being chunked into different sizes, some chunks glowing green and others red

How to retrieve

Retrieval strategy is where cost, latency, and accuracy collide.

Semantic search (vector similarity): Good at finding conceptually related content even when the user's query doesn't match the exact wording. Bad at precise lookups. If the user asks "What's the SLA for enterprise support?" and the document says "Premium tier response time commitment," semantic search will find it. Keyword search won't.

Keyword search (BM25, full-text): Good at exact matches. Bad at conceptual leaps. Fast and cheap. If the user asks for "SLA" and the document uses "SLA," keyword search is faster and more precise than vector similarity.

Hybrid (both): Use semantic search for conceptual matching and keyword search for precision, then merge the results. Better quality. Higher latency. More complex to tune.

The product decision: What's the acceptable latency for this feature? What types of queries do your users actually ask (precise lookups vs. conceptual exploration)? How much retrieval cost per query can your unit economics absorb?

If you're building a customer support assistant where users ask specific questions about their account, keyword search might dominate. If you're building a research tool where analysts explore a knowledge base with fuzzy queries, semantic search is essential. If you're building both, you need hybrid, and you need to tune the weighting.

This decision cascades into your unit economics. Vector embedding and storage costs are non-trivial at scale. A 10-million-document index with daily re-embedding is a different cost proposition than a 10,000-document index updated weekly. The product manager who ignores this will be surprised when the infrastructure bill arrives.

How to rank and filter

Retrieval returns a ranked list of chunks. Which ones do you feed to the model?

Top-K with a relevance threshold: Take the top 5 (or 10, or 20) chunks that score above a minimum relevance threshold. Simple. Predictable. But the right K depends on the query complexity. A simple factual question needs 2 to 3 chunks. A complex analytical question might need 10 to 15.

Relevance-adaptive retrieval: Dynamically adjust how many chunks to include based on the query type and the relevance score distribution. If the top 3 chunks all score above 0.9, you probably have a clear answer. If the top 10 chunks all score between 0.5 and 0.6, the knowledge base might not contain a good answer, and you should tell the user that instead of forcing the model to synthesise a mediocre one.

The product decision here is about honesty. When the retrieval is uncertain, does your AI say "I don't have a confident answer for this" or does it generate something plausible from whatever scraps it found? The first builds trust. The second builds a hallucination factory. I've seen product teams explicitly choose the second because "saying 'I don't know' looks bad." It looks worse when the wrong answer costs a customer money.

How to present sources

Citation is where retrieval quality becomes visible to users.

If your AI answers a question and cites the source, the user can verify. If it answers without citation, the user has to trust blindly. In enterprise contexts (legal, financial services, healthcare), blind trust is a non-starter. In consumer contexts, citation builds confidence and reduces support tickets from users who want to verify.

The product decision: Do you show source documents? Do you link to specific sections? Do you show confidence indicators? Do you let users rate whether the cited source actually supported the answer?

These aren't nice-to-haves. They're the difference between an AI feature users trust and one they route around. And they're entirely product decisions that engineering can't make for you.

The product manager's RAG checklist

Before you hand a RAG feature to engineering, you should be able to answer these:

What are the authoritative sources? Which documents or data stores should the AI treat as ground truth? What should be excluded?
How current does the information need to be? Real-time? Daily? Weekly? This determines your indexing cadence and cost.
What does a good answer look like for each query type? Precise and short? Contextually rich? With citations? Without?
What should happen when the AI doesn't know? Decline to answer? Escalate to a human? Say "I'm not confident" and offer what it found?
What's the acceptable latency? Sub-second for inline assistance. 2 to 3 seconds for research queries. This constrains your retrieval architecture.
What's the cost per query your pricing model can absorb? This constrains your model choices, retrieval complexity, and chunk sizes.

If you can't answer these, you haven't done the product work. You've delegated a product problem to engineering and called it a technical implementation.

RAG is product taste expressed in architecture

The best AI products I've used have a common quality: the retrieval feels intentional. The answers are grounded. The sources are relevant. When the system doesn't know, it says so. When it does know, the answer is precise and well-scoped.

That quality doesn't come from a better embedding model or a fancier vector database. It comes from a product manager who understood the retrieval tradeoffs and made deliberate choices about what to index, how to chunk, when to retrieve, and how to handle uncertainty.

RAG isn't plumbing. It's the nervous system of your AI feature. Treat it as a product decision, and your AI feature earns trust. Treat it as an engineering implementation detail, and you'll spend the next six months debugging hallucinations you could have prevented with a content strategy.

Build narrow, well-scoped AI systems with deliberate retrieval, not broad ones that retrieve everything and hope for the best.

Key takeaways

RAG architecture choices (what to index, how to chunk, when to retrieve, how to cite) are product decisions with direct UX and trust implications, not engineering implementation details that can be fully delegated
Indexing everything without curation is the most common RAG mistake: outdated policies, draft documents, and contradictory information create retrieval pollution that the model tries to reconcile, often producing confidently wrong answers
Chunking strategy determines response character: FAQs chunk well at 200–400 tokens, technical docs at 500–800 tokens, and policy documents need hierarchical chunking at both section and paragraph level
When retrieval confidence is low, an AI that says "I don't have a confident answer" builds trust, while one that synthesises from scraps builds a hallucination factory

Frequently Asked Questions

Should the product manager understand the technical details of vector databases and embedding models?

You don't need to choose the vector database or select the embedding model. You need to understand what they do and what tradeoffs they create. A product manager who understands that embedding models have different strengths for different content types, that vector search has a cost and latency profile, and that chunking strategy affects answer quality will make better product decisions. You're not writing the code. You're setting the constraints that determine whether the code produces a good product.

How do you test RAG quality before launch?

Build an eval suite. Collect 50 to 100 representative questions from your domain. For each question, define what a good answer looks like and which source documents should be retrieved. Run your pipeline against this suite and measure: Did it retrieve the right documents? Did the answer match the expected quality? Did it hallucinate? This is your eval infrastructure applied to retrieval. Without it, you're shipping blind.

What's the biggest RAG mistake you've seen product teams make?

Indexing everything without curation. Teams dump their entire knowledge base, including outdated docs, internal drafts, and contradictory information, into the index and wonder why the AI gives inconsistent answers. The fix is simple but requires product work: define what counts as authoritative, exclude everything else, and maintain the index like you'd maintain any other product asset.