In a typical agent turn, the model may invoke the same tool several times in a row to assemble a summary, compare, or verify before acting:
list_inventory_items()add_inventory()list_inventory_items() (again)If your tool touches an external API, a slow DB, or a service with a rate limit, that triple call costs you money, latency, and sometimes it'll trip the rate limiter on you.
The solution is an old, well-known pattern: cache with TTL. The novelty when applied to agent tools:
cached: true flag for diagnosis.✓ Read tools (naturally idempotent). get_X, list_X, lookup_X.
✓ Tools whose data changes slowly. Configuration, catalogs, reference data.
✓ Expensive tools (high latency or per-call cost).
✗ Write tools. Each call does something different. Doesn't make sense.
✗ Tools whose data changes fast. If your TTL is longer than the actual change frequency, you return stale data as if it were fresh.
✗ Tools with results dependent on the user without keying them separately. If you cache get_my_bookings() with key "default", everyone sees the first user's bookings. The cache key has to include the context that distinguishes.
const cache = new Map(); // key → { data, fetchedAt }
const TTL_MS = 60_000;
async function handle({ deck }) {
const key = deck;
const cached = cache.get(key);
if (cached && Date.now() - cached.fetchedAt < TTL_MS) {
return { ok: true, data: cached.data, cached: true };
}
const data = await atmosphericApi.read(deck);
cache.set(key, { data, fetchedAt: Date.now() });
return { ok: true, data, cached: false };
}cached: true is information tooReturning the flag cached: true on hit lets the agent make interesting decisions:
On the right you have get_weather with no cache. Add it: in-memory cache with 60s TTL, keyed by deck, with the cached flag.
Good caching is one of the cheapest, highest-impact optimizations in real MCPs. One hour of work, one order of magnitude less in API cost.