When you start shipping agents, the rate limit gets handed to you by the provider or infra team: "so they don't bill you extra". That's true. it's also half the story.
Once your assistant hits tools that read data or trigger actions, the rate limit is defense against volume attacks:
1. Per IP / per user. The most obvious. Limits damage from a single-identity attacker.
2. Per sensitive argument. If your tool takes order_id, user_id, email, limit calls sharing that value even from different IPs. This is what stops a distributed attacker who already has a target list.
3. Per cost, not per call. Some models have cheap calls (200 tokens) and expensive ones (10k tokens). an attacker can exploit the metric. limit tokens-per-minute, not just calls-per-minute.
A rate limit that only blocks adds no intelligence. A rate limit that blocks and logs tells you who is pushing edges. That signal enters your monitoring, fires alerts when the pattern is suspicious, and lets you adjust before it becomes an incident.
The rate limit doesn't catch the skilled attacker. it raises cost. it slows them. it removes their volume advantage. that's what you can do from the defensive side. the rest is responding to alerts in time.
On the right: two configurations for the same public endpoint. Pick the one that survives a motivated attacker.