result-to-claim

Solid

Use when experiments complete to judge what claims the results support, what they do not, and what evidence is still missing. A secondary Codex agent evaluates results against intended claims and routes to the next action (pivot, supplement, or confirm). Use after experiments finish - before writing the paper or running ablations.

AI & Automation 11,977 stars 1099 forks Updated yesterday MIT

Install

View on GitHub

Quality Score: 96/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Result-to-Claim Gate Experiments produce numbers; this gate decides what those numbers *mean*. Collect results from available sources, get an objective judgment, then route based on the verdict. ## Context: $ARGUMENTS ## Constants - **REVIEWER_MODEL = `gpt-5.4`** - Used via a secondary Codex agent for objective claim assessment. ## When to Use - After a set of experiments completes (main results, not just sanity checks) - Before committing to claims in a paper or review response - When results are ambiguous and you need an objective second opinion ## Workflow ### Step 1: Collect Results Gather experiment data from whatever sources are available in the project: 1. **W&B** (preferred): `wandb.Api().run("<entity>/<project>/<run_id>").history()` - metrics, training curves, comparisons 2. **`EXPERIMENT_LOG.md`** - full results table with baselines and verdicts 3. **`EXPERIMENT_TRACKER.md`** - check which experiments are done vs still running 4. **Log files** - `ssh server "tail -100 /path/to/training.log"` if no other source 5. **`docs/research_contract.md`** or project notes - intended claims and experiment design Assemble the key information: - What experiments were run (method, dataset, config) - Main metrics and baseline comparisons (deltas) - The intended claim these experiments were designed to test - Any known confounds or caveats ### Step 2: Secondary Codex Judgment Send the collected results to a secondary Codex agent for objective evaluation: ```text spa...

Details

Author
wanshuiyin
Repository
wanshuiyin/Auto-claude-code-research-in-sleep
Created
3 months ago
Last Updated
yesterday
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category