eval-diagnoselisted
Install: claude install-skill Galileo-Agent-Labs/eval-engineer
# Eval Diagnose
Use this skill for evidence-backed RCA once a packet, URL-derived evidence, or
trace/session/log-stream context is available.
## Required Reference
Use `skills/eval-engineer/references/rca-recipe.md`,
`skills/eval-engineer/references/debug-packets.md`,
`skills/eval-engineer/references/evidence-provenance.md`, and
`skills/eval-engineer/assets/diagnosis-template.md`.
## Do
- Start from fetched evidence, not source-code guesses.
- Name the failing metric contract and what it proves.
- Label hosted Galileo evidence separately from local deterministic packets
before making metric or score claims.
- Inspect traces, spans, sessions, tool calls, retrieval context, and scorer
status to classify the fix surface.
- Classify the fix surface: prompt, tool schema, adapter, retriever, ranker,
guardrail, metric, dataset, or SDK wiring.
- Write diagnosis and bounded fix plan only when evidence supports it.
- Honor read-only requests. If the user says read-only, dry run, no edits, or
"do not edit files", do not write `.galileo/` artifacts. Return the RCA
inline and include a short "Would write" list for any suggested artifact
paths.
## Gotchas
- Fetched debug packets are the RCA source of truth when scorer jobs are still
settling or runner output disagrees with fetched metrics.
- A prompt diff, local score, or code diff is not proof of improvement without
before/after Galileo evidence.
- Bare correctness or factuality can be a smoke test only. Prefer the