Echo shows up at the end of the track because the last skill of an LLM systems engineer isn't building; it's knowing what happened when something fails. Without observability, a production bug is a mystery: the user reports "the system said something weird", and you can't reproduce or explain it.
The solution is old: distributed traces, imported from the microservices world but adapted for LLMs.
Spans nest: an agent_loop is a parent span that contains N child spans (one per tool call). The router is a parent span that contains a child span (the flow it picked). The trace shape is a tree.
Minimum viable:
{
"span_id": "spn_042",
"parent_span_id": "spn_001",
"name": "rag.retrieve",
"kind": "retrieval",
"started_at": "2026-05-24T12:34:56.123Z",
"duration_ms": 187,
"status": "success",
"input": "what's the coolant protocol?",
"output": "[3 snippets]",
"model": null,
"tokens_used": null,
"cost_usd": null,
"metadata": { "vector_store": "primary", "top_k": 3 }
}For LLM spans, you add model, tokens_used (in/out separated), and cost_usd. For tool spans, you add the tool_name and args.
partial: when you degrade gracefully (step 14), the trace status is neither success nor error. it's partial. Without this value, you lose visibility of degradation.Traces live in observability systems (Datadog, Honeycomb, OpenTelemetry). Anything you put there is potentially readable by your whole team + the provider. Rules:
[REDACTED]. Never recoverable.A trace that leaks PII in logs is a real data-breach story. Document the redaction policy in the schema, not in a wiki nobody reads.
Write the schema (or a concrete example) of a trace for your system. The judge evaluates 7 criteria on schema coverage. trace root, spans, nesting, cost, status, redaction.