setup

Solid

Set up a new autoresearch experiment interactively. Collects domain, target file, eval command, metric, direction, and evaluator.

AI & Automation 16,642 stars 2295 forks Updated yesterday MIT

Install

Quality Score: 96/100

Stars 20%

100

Recency 20%

100

Frontmatter 20%

70

Documentation 15%

100

Issue Health 10%

50

License 10%

100

Description 5%

100

Skill Content

# /ar:setup — Create New Experiment Set up a new autoresearch experiment with all required configuration. ## Usage ``` /ar:setup # Interactive mode /ar:setup engineering api-speed src/api.py "pytest bench.py" p50_ms lower /ar:setup --list # Show existing experiments /ar:setup --list-evaluators # Show available evaluators ``` ## What It Does ### If arguments provided Pass them directly to the setup script: ```bash python {skill_path}/scripts/setup_experiment.py \ --domain {domain} --name {name} \ --target {target} --eval "{eval_cmd}" \ --metric {metric} --direction {direction} \ [--evaluator {evaluator}] [--scope {scope}] ``` ### If no arguments (interactive mode) Collect each parameter one at a time: 1. **Domain** — Ask: "What domain? (engineering, marketing, content, prompts, custom)" 2. **Name** — Ask: "Experiment name? (e.g., api-speed, blog-titles)" 3. **Target file** — Ask: "Which file to optimize?" Verify it exists. 4. **Eval command** — Ask: "How to measure it? (e.g., pytest bench.py, python evaluate.py)" 5. **Metric** — Ask: "What metric does the eval output? (e.g., p50_ms, ctr_score)" 6. **Direction** — Ask: "Is lower or higher better?" 7. **Evaluator** (optional) — Show built-in evaluators. Ask: "Use a built-in evaluator, or your own?" 8. **Scope** — Ask: "Store in project (.autoresearch/) or user (~/.autoresearch/)?" Then run `setup_experiment.py` with the collected p...

Details

Author: alirezarezvani
Repository: alirezarezvani/claude-skills
Created: 7 months ago
Last Updated: yesterday
Language: Python
License: MIT

Integrates with

OpenAI · AI Anthropic · AI pytest · Testing

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Solid

autoresearch-agent

Autonomous experiment loop that optimizes any file by a measurable metric. Inspired by Karpathy's autoresearch. The agent edits a target file, runs a fixed evaluation, keeps improvements (git commit), discards failures (git reset), and loops indefinitely. Use when: user wants to optimize code speed, reduce bundle/image size, improve test pass rate, optimize prompts, improve content quality (headlines, copy, CTR), or run any measurable improvement loop. Requires: a target file, an evaluation command that outputs a metric, and a git repo.

16,642 Updated yesterday

AI & Automation Listed

autoresearch

Autonomous experiment loop inspired by Karpathy's autoresearch. Iteratively modifies code, runs evaluation, measures a metric, and keeps or discards changes using git. Use when optimizing code against a measurable target (test pass rate, performance, bundle size, model quality, etc).

2 Updated 4 days ago

AI & Automation Listed

autoresearch

Check and run autonomous experiments. Query experiment status, view results dashboards, and execute iterations. TRIGGER when: user asks about experiment status, autoresearch progress, "how's the experiment going", "run another iteration", or invokes "/autoresearch". DO NOT TRIGGER when: user is working on autoresearch agent code itself.

1 Updated 1 weeks ago

AI & Automation Featured

autoresearch

Autonomous iterative experimentation loop for any programming task. Guides the user through defining goals, measurable metrics, and scope constraints, then runs an autonomous loop of code changes, testing, measuring, and keeping/discarding results. Inspired by Karpathy's autoresearch. USE FOR: autonomous improvement, iterative optimization, experiment loop, auto research, performance tuning, automated experimentation, hill climbing, try things automatically, optimize code, run experiments, autonomous coding loop. DO NOT USE FOR: one-shot tasks, simple bug fixes, code review, or tasks without a measurable metric.

34,158 Updated yesterday

AI & Automation Solid

run

Run a single experiment iteration. Edit the target file, evaluate, keep or discard.

16,642 Updated yesterday