langchain-incident-runbook

Featured

Incident response procedures for LangChain production issues: provider outages, high error rates, latency spikes, and cost overruns. Trigger: "langchain incident", "langchain outage", "langchain production issue", "langchain emergency", "langchain down", "LLM provider outage".

AI & Automation 2,266 stars 315 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# LangChain Incident Runbook ## Overview Standard operating procedures for LangChain production incidents: provider outages, error rate spikes, latency degradation, memory issues, and cost overruns. ## Severity Classification | Level | Description | Response Time | Example | |-------|-------------|---------------|---------| | SEV1 | Complete outage | 15 min | All LLM calls failing | | SEV2 | Major degradation | 30 min | >50% error rate, >10s latency | | SEV3 | Minor degradation | 2 hours | <10% errors, slow responses | | SEV4 | Low impact | 24 hours | Intermittent issues, warnings | ## Runbook 1: LLM Provider Outage ### Detect ```bash # Check provider status pages curl -s https://status.openai.com/api/v2/status.json | jq '.status' curl -s https://status.anthropic.com/api/v2/status.json | jq '.status' ``` ### Diagnose ```typescript async function diagnoseProviders() { const results: Record<string, string> = {}; try { const openai = new ChatOpenAI({ model: "gpt-4o-mini", timeout: 10000 }); await openai.invoke("ping"); results.openai = "OK"; } catch (e: any) { results.openai = `FAIL: ${e.message.slice(0, 100)}`; } try { const anthropic = new ChatAnthropic({ model: "claude-sonnet-4-20250514" }); await anthropic.invoke("ping"); results.anthropic = "OK"; } catch (e: any) { results.anthropic = `FAIL: ${e.message.slice(0, 100)}`; } console.table(results); return results; } ``` ### Mitigate ```typescript // Enable fallbac...

Details

Author
jeremylongshore
Repository
jeremylongshore/claude-code-plugins-plus-skills
Created
7 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Featured

langfuse-incident-runbook

Troubleshoot and respond to Langfuse-related incidents and outages. Use when experiencing Langfuse outages, debugging production issues, or responding to LLM observability incidents. Trigger with phrases like "langfuse incident", "langfuse outage", "langfuse down", "langfuse production issue", "langfuse troubleshoot".

2,266 Updated today
jeremylongshore
AI & Automation Featured

langchain-prod-checklist

Production readiness checklist for LangChain applications. Use when preparing for launch, validating deployment readiness, or auditing existing production LangChain systems. Trigger: "langchain production", "langchain prod ready", "deploy langchain", "langchain launch checklist", "go-live langchain".

2,266 Updated today
jeremylongshore
AI & Automation Featured

linear-incident-runbook

Production incident response procedures for Linear integrations. Use when handling production issues, diagnosing outages, or responding to Linear-related incidents. Trigger: "linear incident", "linear outage", "linear production issue", "debug linear production", "linear down", "linear 500".

2,266 Updated today
jeremylongshore
AI & Automation Featured

langchain-common-errors

Diagnose and fix common LangChain errors and exceptions. Use when encountering LangChain import errors, auth failures, output parsing issues, agent loops, or version conflicts. Trigger: "langchain error", "langchain exception", "debug langchain", "langchain not working", "langchain troubleshoot".

2,266 Updated today
jeremylongshore
AI & Automation Featured

langchain-observability

Set up comprehensive observability for LangChain applications with LangSmith tracing, OpenTelemetry, Prometheus metrics, and alerts. Trigger: "langchain monitoring", "langchain metrics", "langchain observability", "langchain tracing", "LangSmith", "langchain alerts".

2,266 Updated today
jeremylongshore