If you ask the model "are you sure?" it'll answer one of two things: "yes" or "let me check". Both are noise. The model doesn't have an internal certainty thermostat. It's predicting the next token.
What does work: ask it to adopt an adversarial stance and critique your plan from there.
"Act as <specific skeptical role>. List 3 concrete reasons this
could fail, and for each one tell me what data you'd need to see
to rule the risk out."Three elements:
There are decisions where the cost of being wrong is high: a migration, a reply to an important customer, a production change. In those cases, an extra devil's advocate turn costs little and prevents a lot.
Variant with humans: the same question works in a meeting. "Before we decide, someone play devil's advocate and tell us 3 reasons this could go wrong." People good at this raise the quality of any group decision.
For throwaway or low-risk outputs (an internal email, a quick summary), devil's advocate is overkill. You pay the cost on important decisions, not on every turn.
On the right, two ways to stress-test a model's plan. Which one brings actionable critique?