← ClaudeAtlas

llm-evaluationlisted

Model output quality assessment, hallucination detection, benchmark suites. [EXPLICIT] Trigger: "llm evaluation"
JaviMontano/jm-adk-alfa · ★ 1 · AI & Automation · score 71
Install: claude install-skill JaviMontano/jm-adk-alfa
# Llm Evaluation > "Method over hacks." ## TL;DR Model output quality assessment, hallucination detection, benchmark suites. [EXPLICIT] ## Procedure ### Step 1: Discover - Gather context and requirements ### Step 2: Analyze - Evaluate options per Constitution XIII/XIV ### Step 3: Execute - Implement with evidence tags ### Step 4: Validate - Verify quality criteria met ## Quality Criteria - [ ] Evidence tags applied - [ ] Constitution-compliant - [ ] Actionable output ## Usage Example invocations: - "/llm-evaluation" — Run the full llm evaluation workflow - "llm evaluation on this project" — Apply to current context ## Assumptions & Limits - Assumes access to project artifacts (code, docs, configs) [EXPLICIT] - Requires English-language output unless otherwise specified [EXPLICIT] - Does not replace domain expert judgment for final decisions [EXPLICIT] ## Edge Cases | Scenario | Handling | |----------|----------| | Empty or minimal input | Request clarification before proceeding | | Conflicting requirements | Flag conflicts explicitly, propose resolution | | Out-of-scope request | Redirect to appropriate skill or escalate |