Data classification tells you what doesn't go to the model. This lesson tells you how to get it out of the text before it arrives.
PII detection isn't magic. It's a classic NLP problem the industry has solved, badly, several times. Hex has already seen the three main mistakes:
Three layers, ordered by cost:
After scrubbing, the text that reaches the main model has stable placeholders: [EMAIL_1], [NAME_2], [ADDRESS_3]. The system keeps the mapping table on the trusted side. If the model's response needs to re-include the real datum, the system de-anonymizes it on serve.
The rule: the model operates on references, never on real data. The less PII reaches the model, the less PII it can leak. Unreachable is invulnerable.
On the right: five implementations of the scrub_pii_before_prompt tool. Pick the one that holds.