result-to-claim

Solid

Use when experiments complete to judge what claims the results support, what they do not, and what evidence is still missing. A secondary Codex agent evaluates results against intended claims and routes to the next action (pivot, supplement, or confirm). Use after experiments finish - before writing the paper or running ablations.

AI & Automation 11,977 stars 1099 forks Updated yesterday MIT

Install

View on GitHub

Quality Score: 96/100

Stars 20%

100

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# Result-to-Claim Gate Experiments produce numbers; this gate decides what those numbers *mean*. Collect results from available sources, get an objective judgment, then route based on the verdict. ## Context: $ARGUMENTS ## Constants - **REVIEWER_MODEL = `gpt-5.4`** - Used via a secondary Codex agent for objective claim assessment. ## When to Use - After a set of experiments completes (main results, not just sanity checks) - Before committing to claims in a paper or review response - When results are ambiguous and you need an objective second opinion ## Workflow ### Step 1: Collect Results Gather experiment data from whatever sources are available in the project: 1. **W&B** (preferred): `wandb.Api().run("<entity>/<project>/<run_id>").history()` - metrics, training curves, comparisons 2. **`EXPERIMENT_LOG.md`** - full results table with baselines and verdicts 3. **`EXPERIMENT_TRACKER.md`** - check which experiments are done vs still running 4. **Log files** - `ssh server "tail -100 /path/to/training.log"` if no other source 5. **`docs/research_contract.md`** or project notes - intended claims and experiment design Assemble the key information: - What experiments were run (method, dataset, config) - Main metrics and baseline comparisons (deltas) - The intended claim these experiments were designed to test - Any known confounds or caveats ### Step 2: Secondary Codex Judgment Send the collected results to a secondary Codex agent for objective evaluation: ```text spa...

Details

Author: wanshuiyin
Repository: wanshuiyin/Auto-claude-code-research-in-sleep
Created: 3 months ago
Last Updated: yesterday
Language: Python
License: MIT

Integrates with

OpenAI · AI

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Solid

research-review

Get a deep critical review of research from GPT using a secondary Codex agent. Use when user says "review my research", "help me review", "get external review", or wants critical feedback on research ideas, papers, or experimental results.

11,977 Updated yesterday

wanshuiyin

AI & Automation Listed

exp-eval

Experiment verdict gate — Review LLM independently judges results → 4 verdict paths → auto-update claims confidence, ideas status, graph edges

45 Updated today

Lambenthan

AI & Automation Solid

ablation-planner

Use when main results pass result-to-claim (`claim_supported = yes` or `partial`) and ablation studies are needed for paper submission. A secondary Codex agent designs ablations from a reviewer's perspective; the local executor reviews feasibility and implements.

11,977 Updated yesterday

wanshuiyin