autoresearchlisted
Install: claude install-skill Evarodenas/autoresearch
# Autoresearch Skill
A faithful implementation of Karpathy's autoresearch — a ratcheting git loop
where the human programs the research organization and the agent runs
experiments autonomously. Every surviving commit represents a genuine
improvement. Failed experiments revert cleanly and get logged for context.
## Core Philosophy
From Karpathy: *"You're not touching any of the Python files like you normally
would as a researcher. Instead, you are programming the `program.md` Markdown
files that provide context to the AI agents and set up your autonomous
research org."*
This means:
- **The human writes `program.md`** — strategy, constraints, metrics, search space
- **The agent edits the target artifact** — one change per experiment, keep or revert
- **`results.tsv` accumulates** — every experiment logged, failures included
- **Git enforces the ratchet** — history is monotonically improving
## Architecture: Three Files + A Frozen Metric
| File | Role | Who edits | Mutable? |
|---|---|---|---|
| `program.md` | Research org specification | Human only | By human |
| Target artifact | The thing being optimized | Agent only | Per experiment |
| `evaluate.*` | How success is measured | Nobody during a session | Frozen |
| `results.tsv` | Experiment log | Agent appends | Append-only |
The separation between the **frozen evaluation** and the **mutable target** is
the critical trust boundary. The agent cannot game its metric by modifying how
success is measured. This is what make