autoresearch

Solid

Autonomous experiment loop inspired by Karpathy's autoresearch. Iteratively modifies code, runs evaluation, measures a metric, and keeps or discards changes using git. Use when optimizing code against a measurable target (test pass rate, performance, bundle size, model quality, etc).

AI & Automation 3 stars 1 forks Updated yesterday Apache-2.0

Install

Quality Score: 79/100

Stars 20%

20

Recency 20%

100

Frontmatter 20%

70

Documentation 15%

100

Issue Health 10%

80

License 10%

100

Description 5%

100

Skill Content

# Autoresearch — Autonomous Experiment Loop You are an autonomous researcher. Your job is to iteratively improve code by running experiments, measuring results, and keeping only improvements. You operate on a dedicated git branch and never stop until manually interrupted. ## Setup Phase Parse arguments from `$ARGUMENTS`. The user must provide at least an `eval_command`. Prompt for anything missing before starting the loop. ### Required - **eval_command**: The command to evaluate an experiment (e.g. `npm test`, `uv run train.py`, `swift build`) ### Optional (prompt if not provided, offer sensible defaults) - **metric**: A grep pattern to extract the metric from eval output (e.g. `^val_bpb:`, `Tests:.*passed`, `bundle size`) - If not provided, default to exit code (0 = pass, nonzero = fail) - **target_files**: Glob or list of files you may modify (e.g. `src/model.ts`, `train.py`) - If not provided, ask the user which files are in scope - **readonly_files**: Files to read for context but never modify - If not provided, infer from the project (README, config files, test fixtures) - **tag**: Branch suffix (default: today's date, e.g. `mar22`) - **direction**: `lower` (minimize metric), `higher` (maximize), or `pass` (binary pass/fail). Default: `pass` - **budget**: Max wall-clock minutes per experiment. Default: `5` ### Initialization Steps 1. **Confirm git is clean**: `git status` must show a clean working tree. If dirty, ask the user to commit or stash. 2. **Create ...

Details

Author: Silex-Research
Repository: Silex-Research/DontPanic
Created: 4 months ago
Last Updated: yesterday
Language: Python
License: Apache-2.0

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Solid

autoresearch

Autonomous experiment loop: edit code, commit, run benchmark, extract metrics, keep improvements or revert, repeat forever. Use this skill when the user asks to "run autoresearch", "start an experiment loop", "optimize a metric autonomously", "autonomous experiments", "benchmark loop", "keep/discard experiments", "optimize test speed", "optimize bundle size", "optimize build time", "run experiments overnight", "speed up my tests", "make my build faster", "reduce compile time", "keep trying until it's faster", "run experiments while I sleep", "overnight optimization", "edit-measure-keep loop", "autoresearch status", or mentions "autoresearch", "experiment loop", "autonomous optimization". Always use this skill when the user wants to iteratively and autonomously improve any measurable metric — even if they don't use the word "autoresearch". Also use when the user asks about the status of a running autoresearch session or wants to cancel/stop one.

12 Updated 1 weeks ago

AI & Automation Solid

autoresearch

Autonomous iterative experimentation loop for any programming task. Guides the user through defining goals, measurable metrics, and scope constraints, then runs an autonomous loop of code changes, testing, measuring, and keeping/discarding results. Inspired by Karpathy's autoresearch. USE FOR: autonomous improvement, iterative optimization, experiment loop, auto research, performance tuning, automated experimentation, hill climbing, try things automatically, optimize code, run experiments, autonomous coding loop. DO NOT USE FOR: one-shot tasks, simple bug fixes, code review, or tasks without a measurable metric.

14 Updated yesterday

AI & Automation Featured

autoresearch-agent

Autonomous experiment loop that optimizes any file by a measurable metric. Inspired by Karpathy's autoresearch. The agent edits a target file, runs a fixed evaluation, keeps improvements (git commit), discards failures (git reset), and loops indefinitely. Use when: user wants to optimize code speed, reduce bundle/image size, improve test pass rate, optimize prompts, improve content quality (headlines, copy, CTR), or run any measurable improvement loop. Requires: a target file, an evaluation command that outputs a metric, and a git repo.

23,342 Updated 1 weeks ago