Naveo

STEP 14 / 22

A5 TASK

YOUR PROMPT · 1 CASES

Orbit asks you to write a retry policy as a JSON object that describes which failure types the system retries, how many times, with what backoff, and which ones NEVER get retried.

Your system invokes tools that can fail in five ways:

network_timeout (connection to the provider expired)
rate_limit_exceeded (the provider throttled you)
invalid_response_format (the model returned malformed JSON)
unauthorized (auth failed)
business_validation_failed (e.g. trying to charge an expired card)

For each one, decide whether to retry, how many times, with what backoff (constant, linear, exponential), and what the final dead-letter is if all retries fail.

Expected format: JSON where each key is the error code and the value is {retry: boolean, max_attempts: int, backoff: "constant" | "linear" | "exponential", initial_delay_ms: int, dead_letter: string}.

305 chars

use {{input}} where the input should go

RUBRIC · 1 CASES · 6 CRITERIA

"meta-evaluation"

CASE 1

GUEST MODE

You're viewing this lesson as a guest. To save your progress, earn XP, and keep your streak, sign in when you're ready to check.

Costs 1 heart

Not every error retries the same way

When a tool fails, your agent has three options: retry, escalate, or dead-letter. Picking wrong leads you to one of three classic bugs:

Retry everything: you amplify an outage (all clients hammering the down provider), spend budget on guaranteed errors, or duplicate operations (charges, tickets, emails).
Retry nothing: every network hiccup is a visible failure. Your system looks fragile when it only needed one retry.
Retry without policy: sometimes 1 time, sometimes 10, sometimes forever. Unpredictable behavior, impossible observability.

The solution: explicit policy per error type.

The three error classes

Class	Examples	Retry
Transient	network timeout, 503, rate limit	YES, with exponential backoff
Permanent	unauthorized, 404 not found, schema mismatch	NO. Escalate or dead-letter
Business	expired card, out of stock, invalid data	NO. Notify the user

Exponential backoff

Standard formula: delay = initial_delay × 2^attempt + jitter.

Attempt 1 fails → wait 500ms.
Attempt 2 fails → wait 1000ms.
Attempt 3 fails → wait 2000ms.
Attempt 4 fails → dead letter.

Jitter (a small random) avoids "thundering herd": if 1000 clients fail at the same time and all retry exactly at 500ms, you worsen the outage. With jitter, retries spread out.

Retry budget

For large systems, you add a retry budget per session: max 5 total retries for the whole agent session, not per error. Without this, a cascade of transient errors can generate hundreds of retries and burn through your quota.

Dead letter: the second parachute

When retries fail, what happens? That's the half of the policy people forget:

queue_for_manual_review: the case goes to a queue a human reviews. Good for rare errors.
alert_oncall: notifies oncall in real time. Good for broken auth and permissions.
notify_user: the user sees the error explicitly and can act. Good for business.
fail_visible: the operation fails and the trace stays in logs. Good when the downstream system handles recovery.

A policy without dead-letter is half a policy. Design both sides: what you retry, and what happens when even retrying doesn't fix it.

Your task

Write the full policy JSON for the 5 error codes. The judge evaluates that each one has the correct strategy and an actionable dead-letter.