The most important architectural question in AI security is where is the trust boundary?. Not "I trust X" / "I don't trust Y". that's binary and false. The real question is: at what point in the flow do you stop treating data as trusted and start treating it as input that needs validation?
Tag every data source:
Trusted can drop in directly. Untrusted must be tagged and validated before crossing to another layer.
User message. Obvious. the user can type anything, including payloads. Must enter the model inside <user_input> tags, after sanitization (length, Unicode, PII scrubbing).
LLM output. Less obvious, but equally important. The model's output is untrusted because the model may have been injected. Any output going to a destination with power (destructive tools, user screen, next agent step) must pass output validation (lesson 15).
External tool response. You hit a third-party API. that response is text from the world, may have a hidden payload. Wrap in <tool_output> and declare it as data, not instruction.
RAG document. Your vector store returned a fragment. that fragment was written months ago, in a different context, possibly by someone hostile. Same treatment: <retrieved_content>, data not instruction.
Internal config. Lives in your repo. If it changed, there was a PR, there was a review. That's the only reason you trust it.
Vault secret. Trusted in content (you put it there, encrypted, audit-logged). But its safe destination is only the tool on the trusted side. Never to model context, never to the user screen.
Hex's rule: when you draw the architecture, color the arrows. Green = trusted. Red = untrusted. Every red arrow needs a validation layer before crossing to a powerful destination. If you can't point at that layer, the layer isn't there.
On the right: six emitters, three destinations. Wire only the green arrows.