finetuning

Solid

This skill should be used when picking or diagnosing a training move (SFT, LoRA, DPO/KTO/ORPO, RFT, GRPO/PPO/RLOO, RLHF), or when the user mentions fine-tuning, post-training, training recipe, reward design, or weight updates. Decision tree by reward shape, smoke-run gate, three failure diagnostics, five false-progress patterns. Provider recipes and I/O contract in references/.

AI & Automation 1,101 stars 81 forks Updated today Apache-2.0

Install

View on GitHub

Quality Score: 96/100

Stars 20%

100

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# Finetuning Priors, not rules. Only firm guardrails: held-out eval you never train on, no leakage, trust evo's recorded numbers over the run's self-report. Override anything else against the gate. ## Pick the technique by reward shape Decide on the reward first, technique second. Choosing the comfortable technique over the matching one is the most common failure. | Reward shape | Technique | |---|---| | Verifiable (exact match, unit tests, parser-decidable) | **RL** (GRPO / RLOO / PPO) — reward includes format, so the model learns to emit verifier-acceptable shape | | Preference pairs (chosen vs rejected) | **DPO / KTO / ORPO** — cheaper than full RL, no rollouts | | Demonstrations only (curated traces, chat data) | **SFT** — install format/tone/capability the base lacks | | Have a scorer + want SFT stability | **RFT** — sample, filter by reward, SFT on survivors | "SFT-then-RL" is not a law. For a competent base model on a verifiable benchmark, RL-from-base often beats SFT-then-RL end-to-end. ## Research the literature before the first commit The decision tree above is the structural prior. The empirical answer for *this* model on *this* benchmark usually has a recent paper, blog, or HF Space recipe behind it -- and what beats baseline on a 4B base model in 2026 is not what the agent's pre-training data captures. Before picking the technique for `exp_0001` (the first experiment after baseline), invoke `evo:ideator` with a `literature` brief: ``` Task( subagent_t...

Details

Author: evo-hq
Repository: evo-hq/evo
Created: 2 months ago
Last Updated: today
Language: Python
License: Apache-2.0

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Solid

fine-tuning-expert

Use when fine-tuning LLMs, training custom models, or adapting foundation models for specific tasks. Invoke for configuring LoRA/QLoRA adapters, preparing JSONL training datasets, setting hyperparameters for fine-tuning runs, adapter training, transfer learning, finetuning with Hugging Face PEFT, OpenAI fine-tuning, instruction tuning, RLHF, DPO, or quantizing and deploying fine-tuned models. Trigger terms include: LoRA, QLoRA, PEFT, finetuning, fine-tuning, adapter tuning, LLM training, model training, custom model.

9,854 Updated 3 weeks ago

Jeffallan

AI & Automation Listed

fine-tuning-expert

7 Updated 2 days ago

ankurCES

AI & Automation Listed

fine-tuning-expert

Use when fine-tuning LLMs, training custom models, or optimizing model performance for specific tasks. Invoke for parameter-efficient methods, dataset preparation, or model adaptation.

2 Updated today

zacklecon