autoresearch

Solid

Autonomous experiment loops that hill-climb a measurable metric — apply one change, measure, keep it only if the number improved, revert if not, repeat unattended. Also deep multi-perspective research producing a saved report, and research-then-optimize when no metric exists yet.

AI & Automation 3 stars 1 forks Updated 2 days ago MIT

Install

View on GitHub

Quality Score: 82/100

Stars 20%

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# Autoresearch An autonomous agent that finds improvements through measured experiments or deep research. Based on Karpathy's autoresearch pattern: separate what the human controls (strategy) from what the agent controls (execution), then let the agent iterate indefinitely with objective verification. ## Choosing a Mode | Mode | Command | When to use | |------|---------|-------------| | **Optimize** | `/autoresearch optimize` | There is code/config/prompt + a way to measure quality. Find improvements autonomously. | | **Research** | `/autoresearch research` | Deep, multi-source research on a topic with synthesis. | | **Improve** | `/autoresearch improve` | Improve something without a clear starting point. Research best practices first, then apply via the optimize loop. | When no mode is specified, infer from context: metric or benchmark mentioned → Optimize. Question or topic exploration → Research. Wants something "better" without a defined measure → Improve. --- ## Mode 1: Optimize (Experiment Loop) The core Karpathy pattern. A hill-climbing ratchet where only measurable improvements accumulate. ### Step 1: Configure the Experiment Before looping, establish four components. Ask the user to confirm if anything is ambiguous — but if the project structure makes the answers obvious, just proceed. | Component | What it is | Example | |-----------|-----------|---------| | **Truth Layer** | Read-only files that define correctness — tests, specs, data, eval harness. The a...

Details

Author: air-gapped
Repository: air-gapped/skills
Created: 3 months ago
Last Updated: 2 days ago
Language: Python
License: MIT

Integrates with

Anthropic · AI Kubernetes · Infrastructure

Bundled in these plugins

skills

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Solid

autoresearch

Autonomous iterative experimentation loop for any programming task. Guides the user through defining goals, measurable metrics, and scope constraints, then runs an autonomous loop of code changes, testing, measuring, and keeping/discarding results. Inspired by Karpathy's autoresearch. USE FOR: autonomous improvement, iterative optimization, experiment loop, auto research, performance tuning, automated experimentation, hill climbing, try things automatically, optimize code, run experiments, autonomous coding loop. DO NOT USE FOR: one-shot tasks, simple bug fixes, code review, or tasks without a measurable metric.

14 Updated yesterday

a-tokyo

AI & Automation Solid

autoresearch

Autonomous experiment loop: edit code, commit, run benchmark, extract metrics, keep improvements or revert, repeat forever. Use this skill when the user asks to "run autoresearch", "start an experiment loop", "optimize a metric autonomously", "autonomous experiments", "benchmark loop", "keep/discard experiments", "optimize test speed", "optimize bundle size", "optimize build time", "run experiments overnight", "speed up my tests", "make my build faster", "reduce compile time", "keep trying until it's faster", "run experiments while I sleep", "overnight optimization", "edit-measure-keep loop", "autoresearch status", or mentions "autoresearch", "experiment loop", "autonomous optimization". Always use this skill when the user wants to iteratively and autonomously improve any measurable metric — even if they don't use the word "autoresearch". Also use when the user asks about the status of a running autoresearch session or wants to cancel/stop one.

12 Updated 1 weeks ago

proyecto26

AI & Automation Listed

autoresearch

Autonomous goal-directed iteration loop that continuously improves prompts, templates, configs, or code. Two evaluation modes — deterministic (eval.py with proxy heuristics) or AI judge (LLM rubric scoring). Uses four-way separation in both modes. Inspired by Karpathy's autoresearch.

45 Updated 5 days ago

naveedharri