shipkit-semantic-qa

Solid

Semantic QA — define inputs/criteria, generate test scripts, Claude judges outputs or screenshots against criteria. Also scores a built app's fidelity (completeness + essence) against captured intent. Triggers: 'semantic qa', 'quality check', 'visual qa', 'judge outputs', 'QA suite', 'fidelity scorecard', 'score fidelity'.

Testing & QA 1 stars 0 forks Updated 1 weeks ago MIT

Install

View on GitHub

Quality Score: 78/100

Stars 20%

Recency 20%

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# shipkit-semantic-qa - Semantic Quality Assurance **Purpose**: Define test inputs and quality criteria, generate test scripts, run them, and let Claude semantically judge outputs (API responses or UI screenshots) against human-defined criteria. **Pattern**: One skill, one loop — Setup → Run → Judge. Two suite types: backend (API/LLM pipeline) and frontend (visual components). --- ## When to Invoke **User triggers:** - "Semantic QA", "Set up QA", "Quality check" - "Visual QA", "Screenshot QA", "Check my UI" - "Judge outputs", "Run QA suite", "Check quality" - "Set up quality criteria", "Define test inputs" **Workflow position:** - After features are implemented (something to test) - Before verify/preflight (catches quality issues early) - Can run standalone against any API or UI --- ## Prerequisites **Required:** None (Setup mode creates everything) **Helpful:** - `.shipkit/stack.json` — Tech stack informs script generation - `.shipkit/specs/` — Acceptance criteria can seed quality criteria - Playwright installed (for frontend suites only) --- ## Process ### Completion Tracking In `--full` mode (all 3 phases sequential), create tasks at the start: - `TaskCreate`: "Setup: Define criteria + generate test script" - `TaskCreate`: "Run: Execute tests + verify output count" - `TaskCreate`: "Judge: Evaluate ALL outputs against ALL criteria" - `TaskCreate`: "Write judgment.md + judgment.json" `TaskUpdate` each task to `in_progress` when starting it, `completed` when do...

Details

Author: stefan-stepzero
Repository: stefan-stepzero/shipkit
Created: 7 months ago
Last Updated: 1 weeks ago
Language: Python
License: MIT

Integrates with

Anthropic · AI Playwright · Testing

Similar Skills

Semantically similar based on skill content — not just same category

Testing & QA Solid

shipkit-qa-visual

Visual QA using Playwright as a browser automation library. --setup installs Playwright and creates ui-goals.json; default mode writes inline scripts to navigate, screenshot, and report against goals.

1 Updated 1 weeks ago

stefan-stepzero

Testing & QA Solid

shipkit-test-cases

Generate and maintain code-anchored test case specifications. Use when setting up test coverage, reviewing what to test, or before team implementation.

1 Updated 1 weeks ago

stefan-stepzero

Code & Development Solid

shipkit-review-shipping

Review changes across 12 quality dimensions and report findings. Use after a chunk of work or before commit.

1 Updated 1 weeks ago

stefan-stepzero