When an LLM conversation fails, it almost never fails on the turn where you notice it. The bad output on turn 6 usually came carried from something that happened turns earlier: a bad assumption, a vague question, a weak answer no one challenged.
To diagnose, you need to see the whole conversation. And to see the whole conversation, you have to have logged it well from the start.
{
"conversation_id": "...",
"system_prompt": "<full text>",
"model": "claude-opus-4-7",
"params": { "temperature": 0.7, "max_tokens": 2048 },
"turns": [
{ "role": "user", "ts": "...", "content": "..." },
{ "role": "assistant", "ts": "...", "content": "..." }
]
}| Anti-pattern | Why it fails |
|---|---|
| Logging only the last turn | You can't reproduce the bug. |
| Logging without conversation_id | Impossible to reconstruct the session. |
| Logging without system_prompt | When it changes, you lose the history. |
| Logging with console prints | They get lost, not searched, not filtered. |
Advanced tip: if your app lets users edit previous messages (like ChatGPT), your log has to capture every version of the message, not just the last. Otherwise you'll have conversations where "the user said X" and the model answered "Y" without you understanding why.
On the right, two ways to log conversations. Which one lets you debug the turn-6 bug?