vikast908
UserGrade your AI-agent repo. One command, one report card — scored, evidence-backed reviews (reliability, security, prompts, evals, token cost, UX, accessibility) for AI/LLM apps.
Categories
Indexed Skills (10)
accessibility-audit
Use when the user wants a WCAG 2.2 accessibility review of a UI — semantics, keyboard operability, focus management, color contrast, ARIA, forms/labels, reduced-motion, and screen-reader support, including streaming AI output via live regions. Triggers on "accessibility audit", "is this WCAG compliant", "a11y review", "is my UI accessible", "screen reader support".
agent-eval-coverage
Use when the user wants to know whether their AI/agent repo has the evals and tests needed to trust changes — checking for golden/regression test sets, prompt regression tests, LLM-as-judge, behavioral & tool-use tests, hallucination/safety checks, CI gating, and metrics. Triggers on "do I have enough evals", "how do I test my agent", "would I know if a prompt change broke things", "eval coverage", "regression tests for prompts".
agent-reliability
Use when the user wants to know whether an AI agent / tool-using loop will survive the real world — reviewing loop termination, tool error handling, retries/backoff, idempotency, timeouts, state & resumability, guardrails, determinism, rate limits, graceful degradation, and observability/tracing. Triggers on "is my agent reliable", "review the agent loop", "why does the agent hang/loop forever", "production-readiness of my agent".
agent-security
Use when the user wants a security review of an AI/agent/LLM app — prompt injection, secret handling, tool permission scoping, sandboxing, data exfiltration, SSRF via tools, unsafe output handling, over-broad agent autonomy, and the OWASP LLM Top 10. Triggers on "is my agent secure", "security review", "can this be prompt-injected", "review for vulnerabilities", "is it safe to give the agent these tools".
product-review
Use when the user wants a product / PM / product-market-fit review of what their repo actually does — evaluating the customer problem, target users, jobs-to-be-done, core functionality, value & differentiation, scope, adoption/usability, positioning, and gaps. Triggers on "is this useful", "review my product", "PMF check", "who is this for", "what's missing", "product critique".
prompt-quality
Use when the user wants to review the craft of the prompts in an AI/agent repo (not their token cost or injection safety) — clarity, structure, system/developer/user role separation, contradictions, brittle string concatenation, output contracts, few-shot quality, edge-case handling, testability, and maintainability. Triggers on "review my prompts", "are my prompts good", "improve this prompt", "why is the model ignoring instructions", "prompt engineering review".
report-card
Use when the user wants ONE combined quality grade for an AI-agent / LLM-app repo instead of running each review separately — auto-detects which reviews apply, runs them, dedupes overlapping findings, and emits a single overall grade, a per-area scorecard, and a prioritized cross-cutting fix list. Triggers on "grade my repo", "is my agent good", "full review", "report card", "run all the reviews", "overall score".
token-efficiency
Use when the user wants to reduce LLM token usage, context-window pressure, or API cost in an AI/agent codebase without hurting quality — reviewing prompt construction, context assembly, chat-history retention, tool definitions, retrieval, caching, batching, and output verbosity. Triggers on "reduce token usage", "cut LLM costs", "we're burning tokens", "optimize context", "why is this so expensive".
ux-audit
Use when the user wants a UX / UI / interaction-design review or redesign of an app, dashboard, editor, canvas, AI/agentic product, web app, or mobile app — including microinteractions, motion, loading, error recovery, empty states, accessibility, perceived performance, and AI trust/observability. Triggers on "audit my UX", "review the interface", "redesign this flow", "is this UI good", "microinteraction review".
data-visualization
Apply Bertin's *Semiology of Graphics* (visual-encoding grammar) and Tufte's *Visual Display of Quantitative Information* (integrity, density, craft) to any chart, plot, dashboard, infographic, table, map, or network diagram. Use whenever the user designs, builds, reviews, or critiques data visualizations — matplotlib/plotly/d3/ggplot/vega/recharts/chart.js code, BI dashboards (Looker/Tableau/Power BI/Metabase/Superset), report figures, scientific plots, financial charts, exec slides with numbers, or choropleth maps. Trigger even when the user doesn't name Bertin or Tufte and only describes a chart in prose ("make a bar chart of …", "this dashboard feels cluttered", "is this graph misleading?", "what kind of map should I use?"). Walks Bertin's encoding workflow (components → visual variables → perceptual levels), then applies Tufte's filters (Lie Factor, data-ink ratio, chartjunk removal, format choice) for concrete, defensible advice.
Bio shown is the top-scored skill's repo description as a fallback — real GitHub bios land in a future update.