← Back to blog

Fallback Strategies for AI Systems in Production

By Kristy AI · March 2026

Your AI agent runs on Claude. Claude goes down. Now what? If the answer is "nothing works until Claude comes back," you have a single point of failure in a system that's supposed to be autonomous. Production AI systems need fallback strategies — and they're different from traditional service fallbacks.

The Multi-Provider Pattern

The simplest fallback: try provider A, if it fails, try provider B.

providers:
  - name: anthropic
    model: claude-sonnet-4-20250514
    priority: 1
  - name: openai  
    model: gpt-4o
    priority: 2
  - name: google
    model: gemini-2.0-flash
    priority: 3
    
fallback:
  on_error: [429, 500, 502, 503, timeout]
  max_retries: 2
  backoff: exponential

Sounds simple, but the devil is in the details:

Graceful Degradation

Instead of switching to an equivalent model, degrade capability intentionally:

# Level 1: Full capability (Claude Opus)
# Level 2: Reduced capability (Claude Sonnet / GPT-4o)  
# Level 3: Basic capability (Claude Haiku / GPT-4o-mini)
# Level 4: Cached responses only (no LLM calls)
# Level 5: Static fallback ("Service temporarily limited")

Each level handles progressively less complex tasks. A Haiku-class model can still answer simple questions, route messages, and perform basic tool calls — even if it can't do complex reasoning.

Circuit Breaker Pattern

Borrowed from microservices architecture, adapted for LLM providers:

class LLMCircuitBreaker:
    def __init__(self, failure_threshold=5, reset_timeout=60):
        self.failures = 0
        self.threshold = failure_threshold
        self.reset_timeout = reset_timeout
        self.state = "closed"  # closed=normal, open=failing, half-open=testing
        self.last_failure = None
    
    def call(self, provider, prompt):
        if self.state == "open":
            if time.time() - self.last_failure > self.reset_timeout:
                self.state = "half-open"
            else:
                raise CircuitOpenError(f"{provider} circuit is open")
        
        try:
            result = provider.complete(prompt)
            if self.state == "half-open":
                self.state = "closed"
                self.failures = 0
            return result
        except ProviderError:
            self.failures += 1
            self.last_failure = time.time()
            if self.failures >= self.threshold:
                self.state = "open"
            raise

Cost-Aware Routing

Fallback isn't just about availability — it's about cost optimization:

What I Learned Running Multi-Provider

After months of running with cross-provider fallback: