vikast908

accessibility-audit

Use when the user wants a WCAG 2.2 accessibility review of a UI — semantics, keyboard operability, focus management, color contrast, ARIA, forms/labels, reduced-motion, and screen-reader support, including streaming AI output via live regions. Triggers on "accessibility audit", "is this WCAG compliant", "a11y review", "is my UI accessible", "screen reader support".

agent-eval-coverage

Use when the user wants to know whether their AI/agent repo has the evals and tests needed to trust changes — checking for golden/regression test sets, prompt regression tests, LLM-as-judge, behavioral & tool-use tests, hallucination/safety checks, CI gating, and metrics. Triggers on "do I have enough evals", "how do I test my agent", "would I know if a prompt change broke things", "eval coverage", "regression tests for prompts".

agent-reliability

Use when the user wants to know whether an AI agent / tool-using loop will survive the real world — reviewing loop termination, tool error handling, retries/backoff, idempotency, timeouts, state & resumability, guardrails, determinism, rate limits, graceful degradation, and observability/tracing. Triggers on "is my agent reliable", "review the agent loop", "why does the agent hang/loop forever", "production-readiness of my agent".

agent-security

Use when the user wants a security review of an AI/agent/LLM app — prompt injection, secret handling, tool permission scoping, sandboxing, data exfiltration, SSRF via tools, unsafe output handling, over-broad agent autonomy, and the OWASP LLM Top 10. Triggers on "is my agent secure", "security review", "can this be prompt-injected", "review for vulnerabilities", "is it safe to give the agent these tools".

Code & Development Listed

product-review

Use when the user wants a product / PM / product-market-fit review of what their repo actually does — evaluating the customer problem, target users, jobs-to-be-done, core functionality, value & differentiation, scope, adoption/usability, positioning, and gaps. Triggers on "is this useful", "review my product", "PMF check", "who is this for", "what's missing", "product critique".

prompt-quality

Use when the user wants to review the craft of the prompts in an AI/agent repo (not their token cost or injection safety) — clarity, structure, system/developer/user role separation, contradictions, brittle string concatenation, output contracts, few-shot quality, edge-case handling, testability, and maintainability. Triggers on "review my prompts", "are my prompts good", "improve this prompt", "why is the model ignoring instructions", "prompt engineering review".

report-card

Use when the user wants ONE combined quality grade for an AI-agent / LLM-app repo instead of running each review separately — auto-detects which reviews apply, runs them, dedupes overlapping findings, and emits a single overall grade, a per-area scorecard, and a prioritized cross-cutting fix list. Triggers on "grade my repo", "is my agent good", "full review", "report card", "run all the reviews", "overall score".

token-efficiency

Use when the user wants to reduce LLM token usage, context-window pressure, or API cost in an AI/agent codebase without hurting quality — reviewing prompt construction, context assembly, chat-history retention, tool definitions, retrieval, caching, batching, and output verbosity. Triggers on "reduce token usage", "cut LLM costs", "we're burning tokens", "optimize context", "why is this so expensive".

Web & Frontend Listed

ux-audit

Use when the user wants a UX / UI / interaction-design review or redesign of an app, dashboard, editor, canvas, AI/agentic product, web app, or mobile app — including microinteractions, motion, loading, error recovery, empty states, accessibility, perceived performance, and AI trust/observability. Triggers on "audit my UX", "review the interface", "redesign this flow", "is this UI good", "microinteraction review".

Data & Documents Listed

data-visualization

Apply Bertin's *Semiology of Graphics* (visual-encoding grammar) and Tufte's *Visual Display of Quantitative Information* (integrity, density, craft) to any chart, plot, dashboard, infographic, table, map, or network diagram. Use whenever the user designs, builds, reviews, or critiques data visualizations — matplotlib/plotly/d3/ggplot/vega/recharts/chart.js code, BI dashboards (Looker/Tableau/Power BI/Metabase/Superset), report figures, scientific plots, financial charts, exec slides with numbers, or choropleth maps. Trigger even when the user doesn't name Bertin or Tufte and only describes a chart in prose ("make a bar chart of …", "this dashboard feels cluttered", "is this graph misleading?", "what kind of map should I use?"). Walks Bertin's encoding workflow (components → visual variables → perceptual levels), then applies Tufte's filters (Lie Factor, data-ink ratio, chartjunk removal, format choice) for concrete, defensible advice.

1 Updated 3 weeks ago