Models have knowledge frozen at training. everything that happened after, or everything internal to your organization, they don't know. Two ways to give them access:
RAG is the dominant choice when:
[user query]
↓
retrieve (vector search over the corpus → top-K snippets)
↓
augment (composes a prompt: snippets + original question)
↓
generate (LLM answers using the augmented prompt, citing sources)
↓
[answer + citations]Your job: design the YAML config of the pipeline. Three steps, chained by output_keys.
The most common problem isn't retrieval. it's that the generate doesn't handle the "snippets aren't relevant to the question" case. When that happens, if you don't tell the model what to do, it invents an answer with bits from the snippets. That's worse than not answering: it looks confident, carries fake citations, and the user doesn't notice.
Practical rule: in the generator's prompt, always include something like "If the snippets don't contain relevant information, say you didn't find that in the manual." It's the difference between honest RAG and hallucinating RAG.
5 LLM-judge criteria:
retrieve declared as a tool/function call (not LLM).augment receives question + snippets, combines them explicitly.generate uses the augmented_prompt (not the raw question).generate asks for citations from the snippets.generate handles "empty / not-relevant snippets" without hallucinating.