prompt-injection-testerlisted

Red-team an LLM application against prompt injection and jailbreaks using a curated, categorized payload library and canary-based detection, then produce a resilience score. Use when the user asks to "test my chatbot for prompt injection", "check if my AI app is jailbreakable", "red-team my LLM", "evaluate prompt-injection defenses", or audit a system prompt's guardrails.
NovaCode37/claude-security-skills · ★ 8 · AI & Automation · score 74

Install: claude install-skill NovaCode37/claude-security-skills

# Prompt Injection Tester A defensive red-team harness for evaluating the prompt-injection resistance of **LLM applications you own or are authorized to test**. It ships a library of well-documented public attack techniques and a canary-based detection engine that decides whether each attack succeeded — then scores overall resilience. > ⚠️ Use only against systems you own or have permission to test. The payloads > are public hardening techniques, intended to *strengthen* guardrails. ## When to use this skill - "Is my chatbot vulnerable to prompt injection / jailbreaks?" - "Red-team / pentest my LLM app's system prompt." - "Score how well my guardrails resist instruction-override attacks." - Regression-testing guardrails in CI after a prompt change. ## Attack categories covered `instruction-override` · `system-prompt-leak` · `role-play` (DAN-style) · `delimiter-escape` · `encoding` (base64/leetspeak) · `data-exfiltration` (indirect injection) · `refusal-suppression`. ## How it works 1. A unique **canary** secret is embedded into a guarded system prompt. 2. Each payload is sent as the user turn. 3. The response is scored: it's **vulnerable** if it hits an attack success-marker or leaks the canary; **resisted** if it refuses. 4. You get a **resilience score** (0–100) and a per-category breakdown. ## How to run it List the payload library (no model calls): ```bash python skills/prompt-injection-tester/attacker.py --list python skills/prompt-injection-tester/attacker