A model that hallucinates is a model that prefers any answer to no answer. That's a training-data artifact: in most training text, somebody completes a question with something. The probability mass on "I don't know" is small. The model has to be steered hard to use it.
This lesson teaches the steering. You write a prompt that forces the model into a three-way decision: confident YES, confident NO, or the literal UNKNOWN. The trap cases include specifics the model could not possibly know. last Tuesday's cargo mass, who signed what on what date. A naive prompt will get the model to invent. A hardened prompt will get it to refuse.
Once you have a model that reliably returns
UNKNOWNon what it can't verify, you've built a foundation for trust. You can then route theUNKNOWNcases to a human, to a tool that does have the data, or to a retry with more context. What you can NOT do is route a confident hallucination. because by definition you can't tell it's a hallucination.
UNKNOWN. The training data wants to fill in numbers; your prompt must override that pull.UNKNOWN for everything is just a different kind of broken.The skill is calibration: tightening exactly enough to refuse on specifics without losing the ability to answer the general.
If a case fails, look at the model's actual output for that case (the runner shows it). The bug is in your prompt, not the model. find what your prompt failed to forbid or failed to allow.