finetuning

Solid

This skill should be used when picking or diagnosing a training move (SFT, LoRA, DPO/KTO/ORPO, RFT, GRPO/PPO/RLOO, RLHF), or when the user mentions fine-tuning, post-training, training recipe, reward design, or weight updates. Decision tree by reward shape, smoke-run gate, three failure diagnostics, five false-progress patterns. Provider recipes and I/O contract in references/.

AI & Automation 1,101 stars 81 forks Updated today Apache-2.0

Install

View on GitHub

Quality Score: 96/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Finetuning Priors, not rules. Only firm guardrails: held-out eval you never train on, no leakage, trust evo's recorded numbers over the run's self-report. Override anything else against the gate. ## Pick the technique by reward shape Decide on the reward first, technique second. Choosing the comfortable technique over the matching one is the most common failure. | Reward shape | Technique | |---|---| | Verifiable (exact match, unit tests, parser-decidable) | **RL** (GRPO / RLOO / PPO) — reward includes format, so the model learns to emit verifier-acceptable shape | | Preference pairs (chosen vs rejected) | **DPO / KTO / ORPO** — cheaper than full RL, no rollouts | | Demonstrations only (curated traces, chat data) | **SFT** — install format/tone/capability the base lacks | | Have a scorer + want SFT stability | **RFT** — sample, filter by reward, SFT on survivors | "SFT-then-RL" is not a law. For a competent base model on a verifiable benchmark, RL-from-base often beats SFT-then-RL end-to-end. ## Research the literature before the first commit The decision tree above is the structural prior. The empirical answer for *this* model on *this* benchmark usually has a recent paper, blog, or HF Space recipe behind it -- and what beats baseline on a 4B base model in 2026 is not what the agent's pre-training data captures. Before picking the technique for `exp_0001` (the first experiment after baseline), invoke `evo:ideator` with a `literature` brief: ``` Task( subagent_t...

Details

Author
evo-hq
Repository
evo-hq/evo
Created
2 months ago
Last Updated
today
Language
Python
License
Apache-2.0

Similar Skills

Semantically similar based on skill content — not just same category