← ClaudeAtlas

autoresearchlisted

Karpathy's autoresearch: autonomous ratcheting optimization loops for any artifact. A human writes program.md, the agent runs experiments with git-backed keep/revert. Trigger on "optimize this", "make this better", "iterate on", "autoresearch", "loop on this", "A/B test", "find the best version", Karpathy's loop, experiment loops, hill climbing, the ratchet pattern, or program.md workflows. Works across code, prompts, content, models, and configs.
Evarodenas/autoresearch · ★ 0 · AI & Automation · score 72
Install: claude install-skill Evarodenas/autoresearch
# Autoresearch Skill A faithful implementation of Karpathy's autoresearch — a ratcheting git loop where the human programs the research organization and the agent runs experiments autonomously. Every surviving commit represents a genuine improvement. Failed experiments revert cleanly and get logged for context. ## Core Philosophy From Karpathy: *"You're not touching any of the Python files like you normally would as a researcher. Instead, you are programming the `program.md` Markdown files that provide context to the AI agents and set up your autonomous research org."* This means: - **The human writes `program.md`** — strategy, constraints, metrics, search space - **The agent edits the target artifact** — one change per experiment, keep or revert - **`results.tsv` accumulates** — every experiment logged, failures included - **Git enforces the ratchet** — history is monotonically improving ## Architecture: Three Files + A Frozen Metric | File | Role | Who edits | Mutable? | |---|---|---|---| | `program.md` | Research org specification | Human only | By human | | Target artifact | The thing being optimized | Agent only | Per experiment | | `evaluate.*` | How success is measured | Nobody during a session | Frozen | | `results.tsv` | Experiment log | Agent appends | Append-only | The separation between the **frozen evaluation** and the **mutable target** is the critical trust boundary. The agent cannot game its metric by modifying how success is measured. This is what make