← ClaudeAtlas

eval-autoresearch-fitlisted

Trigger with "evaluate autoresearch fit", "score this skill for karpathy loop", "is this a good autoresearch candidate", "assess autoresearch viability for", "which skills are best for autonomous loop optimization", "score skills for 3-file architecture", or when the user wants to determine if a skill is a good candidate for applying the Karpathy autoresearch autonomous optimization loop pattern.
richfrem/agent-plugins-skills · ★ 3 · AI & Automation · score 65
Install: claude install-skill richfrem/agent-plugins-skills
# Evaluate Autoresearch Fit Assess whether a skill is a viable candidate for the Karpathy 3-File Autoresearch autonomous optimization loop. Scores each skill on four dimensions, proposes what the 3-file architecture would look like, and updates the canonical `summary-ranked-skills.json` via the update script. ## Background The Karpathy autoresearch pattern requires three conditions simultaneously: 1. **A Clear Metric** — a single number with a clear optimization direction 2. **Automated Evaluation** — no human in the loop; scoring runs headlessly from a shell command 3. **One Editable File** — the agent mutates only a single predefined target per loop Skills that lack these properties cannot run an effective autonomous loop. ## Data File The canonical ranked skills list lives at: ``` plugin-research/experiments/analyze-candidates-for-auto-reseaarch/skills/eval-autoresearch-fit/assets/resources/summary-ranked-skills.json ``` After every evaluation, update it with the update script (see Step 5). ## Scoring Dimensions Each dimension is scored 1-10. Max total = 40. | Dimension | 10 (Best) | 1 (Worst) | |---|---|---| | **Objectivity** | Binary pass/fail or exact numeric output from a shell command | Purely subjective, requires human taste judgment | | **Execution Speed** | Completes in seconds | Requires 30+ min or human input | | **Frequency of Use** | Triggered multiple times per day | Rarely needed (monthly or less) | | **Potential Utility** | Prevents systemic failur