durable-agent-workflowslisted
Install: claude install-skill mouadja02/skills
# Durable Agent Workflows
**Tier:** POWERFUL
**Category:** AI Agents
**Domain:** Workflow Orchestration / Agent Infrastructure / Reliability Engineering
## Overview
Production AI agents fail constantly — LLM rate limits, timeouts, network errors, context overflows. This skill covers building agent workflows that are **durable** (survive crashes), **observable** (you can see what's happening), and **recoverable** (resume from any checkpoint). It bridges the gap between prototype agents and production infrastructure.
## When to Use
- Agent pipelines that run for minutes/hours and must not lose state
- Multi-step LLM workflows that need automatic retry with backoff
- Human-in-the-loop approval gates in autonomous agent pipelines
- Agent orchestration that must survive process restarts/deployments
- Long-running research or analysis agents that checkpoint progress
- Multi-agent systems that need coordination and state isolation
- Any agent system going from prototype to production reliability
## Core Concepts
### The Durability Problem
```
❌ Naive Agent (dies on failure):
Step 1 ✓ → Step 2 ✓ → Step 3 ✓ → Step 4 💥 → ALL LOST
✅ Durable Agent (resumes from checkpoint):
Step 1 ✓ → Step 2 ✓ → Step 3 ✓ → Step 4 💥
[restart] → Step 4 ✓ → Step 5 ✓ → Done ✓
```
### Architecture Patterns
#### Pattern 1: Temporal Workflow (Recommended for Production)
```typescript
// workflow.ts — deterministic orchestration
import { proxyActivities, sleep } from '@temporalio/workflow';
i