So far you thought of chains as sequences: A → B → C → D. That works when each step really depends on the previous one. But often the middle steps are independent of each other, all feeding off the same upstream and returning to the same downstream.
When that happens, running them serially is gifting latency. The chain becomes a graph: a fan-out that parallelizes the middle, and a fan-in that joins everything.
Three conditions, all required:
classify to finish, none waits for another parallel.If all three hold, the middle of the graph runs as a parallel batch. Three LLM calls at the same time. Total latency = the slowest of the three, not the sum.
[classify]
↓
┌───────┼───────┐
↓ ↓ ↓
[tech] [people] [severity] ← parallel, all wait on classify
└───────┼───────┘
↓
[recommend] ← fan-in: waits for all 3 above
↓
[digest] ← closeThinking of chains as graphs, not lists, is what separates the novice designer from the expert. A list is a special case of a graph (when there's only one path). The graph gives you parallelism for free when dependencies allow it.
On the right: six steps of the captain's digest. Some are parallel (their detail says so), some aren't. Order them respecting real dependencies. When two steps can run at the same time, put them consecutively in the order indicated by their numbers. The runtime will fire them together.