Two mental categories the agent needs to recognize:
Read-only tools. Query state, don't modify it. Invoking them a thousand times is the same as once. lookup_crewmate, list_inventory_items, get_shift_status. Low risk: when in doubt, the agent can over-invoke without harm.
Side-effect tools. Modify state, send notifications, spend money, create things that need to be cleaned up later. assign_shift, transfer_credits, send_broadcast, delete_user. High risk: the agent has to know they're destructive or costly so it handles them carefully (confirm beforehand, don't retry blindly).
The problem: the model can't infer which is which from the name alone. send_broadcast seems obvious to you, but "send" to the model can be as innocent as send_query (a query) or as serious as send_payment (irreversible). The difference you have to write in the description.
Say explicitly "side effect" / "mutates state" / "sends to humans". That phrase is the signal the model uses to treat the call more carefully.
Say whether it's reversible. "This action is not reversible." changes the agent's behavior: instead of invoking on the first prompt, it'll confirm.
Suggest confirming with the user before invoking. "Before invoking, confirm with the user the content and the channel." is an instruction the model respects. you saw it in Track 2 with DecisionFlow.
Limit the triggers. "Do not use as a response to hypothetical questions or queries. Only when the user explicitly asks for a broadcast." closes the door to wrong invocations.
On the right you have send_broadcast with the full schema. Only the description is missing. Convince the agent that this is not just another tool.
The validator looks for three key phrases:
A technique that pays off: for tools with serious effects, offer a dry-run flag. E.g. send_broadcast(..., dry_run: true) returns what would be sent, without sending. The agent can use it to check before committing. Mention this in the description and the agent will take advantage of it.
Track 3 rule that translates to production: every destructive tool must be obvious from the description. If you need to read the handler code to know it's destructive, you already lost.