prompt-injection-testerlisted
Install: claude install-skill NovaCode37/claude-security-skills
# Prompt Injection Tester
A defensive red-team harness for evaluating the prompt-injection resistance of
**LLM applications you own or are authorized to test**. It ships a library of
well-documented public attack techniques and a canary-based detection engine
that decides whether each attack succeeded — then scores overall resilience.
> ⚠️ Use only against systems you own or have permission to test. The payloads
> are public hardening techniques, intended to *strengthen* guardrails.
## When to use this skill
- "Is my chatbot vulnerable to prompt injection / jailbreaks?"
- "Red-team / pentest my LLM app's system prompt."
- "Score how well my guardrails resist instruction-override attacks."
- Regression-testing guardrails in CI after a prompt change.
## Attack categories covered
`instruction-override` · `system-prompt-leak` · `role-play` (DAN-style) ·
`delimiter-escape` · `encoding` (base64/leetspeak) · `data-exfiltration`
(indirect injection) · `refusal-suppression`.
## How it works
1. A unique **canary** secret is embedded into a guarded system prompt.
2. Each payload is sent as the user turn.
3. The response is scored: it's **vulnerable** if it hits an attack
success-marker or leaks the canary; **resisted** if it refuses.
4. You get a **resilience score** (0–100) and a per-category breakdown.
## How to run it
List the payload library (no model calls):
```bash
python skills/prompt-injection-tester/attacker.py --list
python skills/prompt-injection-tester/attacker