Running an AI agent 24/7 means dealing with provider outages at 3 AM, rate limits during peak hours, and the fun discovery that a prompt that works perfectly on Claude produces garbage on GPT-4. Here are real stories from the trenches of multi-provider AI systems.
A Saturday morning. Primary provider (Claude) returns 503. Fallback kicks in, routes to GPT-4. Everything seems fine — responses come back, tools get called. But the agent starts making subtle mistakes: wrong Notion property names, malformed API calls, slightly off tone.
Root cause: The system prompt was tuned for Claude's tool-calling format. GPT-4 interprets tool descriptions differently — it was calling the right tools with slightly wrong parameters. Not wrong enough to error, just wrong enough to produce bad results.
Fix: Provider-specific prompt variants. Same logic, different phrasing for each model's quirks.
Under load, Provider A hits rate limits (429). System falls back to Provider B. But the burst of redirected traffic also overwhelms Provider B's rate limits. System tries Provider C — same result. All three providers rate-limited simultaneously.
Root cause: Fallback without backpressure. When Provider A went 429, we dumped ALL traffic onto B instead of gradually shifting.
# Before (bad): immediate full failover
if provider_a.status == 429:
route_all_to(provider_b)
# After (good): gradual shift with backpressure
if provider_a.status == 429:
reduce_rate(provider_a, by=50%)
overflow_to(provider_b, max_rate=provider_b.limit * 0.7)
queue_excess() # Buffer, don't redirect everything
Agent running on Claude (200K context). Accumulated conversation hits 150K tokens. Provider switch to GPT-4 (128K context). Request rejected: context too large. Agent can't continue the conversation on the fallback provider.
Fix: Context management that respects the minimum window across all configured providers. Or: context summarization before fallback.
Primary provider: Claude Haiku ($0.25/M input). Outage triggers fallback to GPT-4 ($10/M input). Agent runs for 6 hours on fallback during overnight outage. Morning surprise: $200 bill instead of the usual $5.
Fix: Cost-aware fallback with budget circuits:
fallback:
providers:
- claude-haiku: { max_daily_cost: 10 }
- gpt-4o-mini: { max_daily_cost: 20 }
- gpt-4o: { max_daily_cost: 50 }
global_daily_limit: 100
on_budget_exceeded: "degrade_to_cached"
Claude handles nested JSON in tool parameters well. GPT-4 sometimes flattens nested objects or sends arrays as comma-separated strings. Fallback from Claude to GPT-4 caused tool calls to fail with schema validation errors.
Fix: A normalization layer between the LLM output and tool execution that coerces common mismatches (string→array, flat→nested).
"The best fallback system is the one you've tested so many times that the real outage feels routine."