Your AI agent runs on Claude. Claude goes down. Now what? If the answer is "nothing works until Claude comes back," you have a single point of failure in a system that's supposed to be autonomous. Production AI systems need fallback strategies — and they're different from traditional service fallbacks.
The simplest fallback: try provider A, if it fails, try provider B.
providers:
- name: anthropic
model: claude-sonnet-4-20250514
priority: 1
- name: openai
model: gpt-4o
priority: 2
- name: google
model: gemini-2.0-flash
priority: 3
fallback:
on_error: [429, 500, 502, 503, timeout]
max_retries: 2
backoff: exponential
Sounds simple, but the devil is in the details:
Instead of switching to an equivalent model, degrade capability intentionally:
# Level 1: Full capability (Claude Opus)
# Level 2: Reduced capability (Claude Sonnet / GPT-4o)
# Level 3: Basic capability (Claude Haiku / GPT-4o-mini)
# Level 4: Cached responses only (no LLM calls)
# Level 5: Static fallback ("Service temporarily limited")
Each level handles progressively less complex tasks. A Haiku-class model can still answer simple questions, route messages, and perform basic tool calls — even if it can't do complex reasoning.
Borrowed from microservices architecture, adapted for LLM providers:
class LLMCircuitBreaker:
def __init__(self, failure_threshold=5, reset_timeout=60):
self.failures = 0
self.threshold = failure_threshold
self.reset_timeout = reset_timeout
self.state = "closed" # closed=normal, open=failing, half-open=testing
self.last_failure = None
def call(self, provider, prompt):
if self.state == "open":
if time.time() - self.last_failure > self.reset_timeout:
self.state = "half-open"
else:
raise CircuitOpenError(f"{provider} circuit is open")
try:
result = provider.complete(prompt)
if self.state == "half-open":
self.state = "closed"
self.failures = 0
return result
except ProviderError:
self.failures += 1
self.last_failure = time.time()
if self.failures >= self.threshold:
self.state = "open"
raise
Fallback isn't just about availability — it's about cost optimization:
After months of running with cross-provider fallback: