In step 11 you implemented ceilings on the runtime side: max_steps protects against loops, loop_detection against stuck, tool errors don't terminate the loop. That protects the system from model bugs.
But the other half is missing: the agent has to KNOW its ceilings and behave accordingly. If it only has runtime-side ceilings, when it hits them it stops abruptly, without warning, leaving the user with an empty answer.
Each turn the runtime injects a section like:
The system prompt instructs the agent to:
null or a cryptic error. The user doesn't know what happened or what they have from the work done.An agent well-trained in budgets looks considerate: warns, proposes, condenses, and respects the limits. An agent without budgets looks unstable: sometimes it finishes perfectly, sometimes it leaves the user hanging, and you can never tell which you'll get.
Write the agent's system prompt. Three ceilings to manage, a warning rule, a partial-answer rule. The judge evaluates five criteria on your prompt.