experiment-loop

Solid

Autonomous experiment loop: hypothesize > modify > test > evaluate > keep/discard > repeat. Run N experiments automatically with measurable metrics. Works for performance optimization, A/B testing, prompt engineering, and any measurable improvement task.

AI & Automation 495 stars 41 forks Updated 1 months ago MIT

Install

Quality Score: 86/100

Stars 20%

90

Recency 20%

75

Frontmatter 20%

70

Documentation 15%

100

Issue Health 10%

50

License 10%

100

Description 5%

100

Skill Content

# Experiment Loop Autonomous, iterative improvement inspired by Karpathy's autoresearch methodology. Define a metric, set a target, and let the loop run until the target is met or the iteration limit is reached. ## The 5-Step Loop ``` 1. HYPOTHESIZE -> Form a specific, falsifiable improvement hypothesis 2. MODIFY -> Apply the minimal code/config/prompt change 3. TEST -> Run the measurement suite (benchmarks, tests, evals) 4. EVALUATE -> Compare result against baseline and previous best 5. DECIDE -> KEEP if better, DISCARD (git stash pop --index) if worse | Repeat until target met OR max_iterations reached ``` Each iteration is atomic: one hypothesis, one change, one measurement, one decision. ## Experiment Definition Define an experiment in your task or in `thoughts/EXPERIMENTS.md`: ```yaml experiment: name: "reduce-api-latency" metric: "p95 response time (ms)" baseline: 340 target: 200 direction: minimize # minimize | maximize max_iterations: 10 # hard cap, never exceed measurement_cmd: "npm run bench:api" measurement_key: "p95" # JSON key from bench output scope: "src/api/" # files the loop is allowed to touch ``` ### Key Fields | Field | Description | |-------|-------------| | `metric` | Human-readable name of what you are measuring | | `baseline` | Measured value before any changes (run this first) | | `target` | Success condition -- loop exits when this is met | | `direction`...

Details

Author: vibeeval
Repository: vibeeval/vibecosystem
Created: 2 months ago
Last Updated: 1 months ago
Language: C#
License: MIT

Integrates with

Anthropic · AI

Similar Skills

Semantically similar based on skill content — not just same category

Web & Frontend Listed

experiment

Automated optimization loop with scalar fitness function. Proposes changes in isolated worktrees, measures with a metric command, keeps improvements, discards failures. Supports convergence detection and diminishing returns.

1 Updated today

allysgrandiose674

AI & Automation Listed

autoresearch

Autonomous experiment loop inspired by Karpathy's autoresearch. Iteratively modifies code, runs evaluation, measures a metric, and keeps or discards changes using git. Use when optimizing code against a measurable target (test pass rate, performance, bundle size, model quality, etc).

2 Updated 4 days ago

AI & Automation Featured

autoresearch

Autonomous iterative experimentation loop for any programming task. Guides the user through defining goals, measurable metrics, and scope constraints, then runs an autonomous loop of code changes, testing, measuring, and keeping/discarding results. Inspired by Karpathy's autoresearch. USE FOR: autonomous improvement, iterative optimization, experiment loop, auto research, performance tuning, automated experimentation, hill climbing, try things automatically, optimize code, run experiments, autonomous coding loop. DO NOT USE FOR: one-shot tasks, simple bug fixes, code review, or tasks without a measurable metric.

34,158 Updated yesterday

AI & Automation Solid

autoresearch-agent

Autonomous experiment loop that optimizes any file by a measurable metric. Inspired by Karpathy's autoresearch. The agent edits a target file, runs a fixed evaluation, keeps improvements (git commit), discards failures (git reset), and loops indefinitely. Use when: user wants to optimize code speed, reduce bundle/image size, improve test pass rate, optimize prompts, improve content quality (headlines, copy, CTR), or run any measurable improvement loop. Requires: a target file, an evaluation command that outputs a metric, and a git repo.

16,642 Updated yesterday

AI & Automation Solid

loop

Start an autonomous experiment loop with user-selected interval (10min, 1h, daily, weekly, monthly). Uses CronCreate for scheduling.

16,642 Updated yesterday