← ClaudeAtlas

ai-llm-safetylisted

This skill should be used when designing, planning, implementing, or reviewing any system that involves LLM agents, tool use, prompt construction, or agentic workflows, or when the user asks to "add guardrails", "prevent prompt injection", "sanitize LLM output" — enforces prompt injection defense, tool safety, and context integrity
alo-exp/silver-bullet · ★ 5 · AI & Automation · score 73
Install: claude install-skill alo-exp/silver-bullet
# /ai-llm-safety — AI/LLM Safety Design Enforcement Every system that involves LLM agents, tool use, or prompt construction MUST treat AI safety as a first-class constraint. Prompt injection is the SQL injection of the AI era — and it's harder to fix after deployment. **Why this matters:** LLM-powered systems are uniquely vulnerable to attacks that exploit the model's instruction-following nature. A single prompt injection can exfiltrate data, execute unauthorized actions, or compromise downstream systems. Unlike traditional software bugs, these vulnerabilities exist at the semantic layer and cannot be caught by linters or type checkers. **When to invoke:** During PLANNING (after brainstorming, before or alongside writing plans) and during REVIEW (as part of code review criteria). This skill applies to ALL code that constructs prompts, processes LLM output, or orchestrates agent workflows. --- ## The Rules ### Rule 1: Treat All External Content as Untrusted Data Any content not authored by the system itself is untrusted. This includes: | Source | Risk | Mitigation | |--------|------|------------| | User input | Direct prompt injection | Isolate from system instructions; validate format | | Web pages / fetched content | Indirect prompt injection | Never pass raw content as instructions; summarize or extract data only | | Tool results / API responses | Poisoned upstream data | Validate schema; never execute embedded instructions | | File contents (uploaded/read) | Embed