hugging-face-model-trainer

Featured

Train or fine-tune TRL language models on Hugging Face Jobs, including SFT, DPO, GRPO, and GGUF export.

AI & Automation 39,350 stars 6386 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# TRL Training on Hugging Face Jobs ## Overview Train language models using TRL (Transformer Reinforcement Learning) on fully managed Hugging Face infrastructure. No local GPU setup required—models train on cloud GPUs and results are automatically saved to the Hugging Face Hub. **TRL provides multiple training methods:** - **SFT** (Supervised Fine-Tuning) - Standard instruction tuning - **DPO** (Direct Preference Optimization) - Alignment from preference data - **GRPO** (Group Relative Policy Optimization) - Online RL training - **Reward Modeling** - Train reward models for RLHF **For detailed TRL method documentation:** ```python hf_doc_search("your query", product="trl") hf_doc_fetch("https://huggingface.co/docs/trl/sft_trainer") # SFT hf_doc_fetch("https://huggingface.co/docs/trl/dpo_trainer") # DPO # etc. ``` **See also:** `references/training_methods.md` for method overviews and selection guidance ## When to Use This Skill Use this skill when users want to: - Fine-tune language models on cloud GPUs without local infrastructure - Train with TRL methods (SFT, DPO, GRPO, etc.) - Run training jobs on Hugging Face Jobs infrastructure - Convert trained models to GGUF for local deployment (Ollama, LM Studio, llama.cpp) - Ensure trained models are permanently saved to the Hub - Use modern workflows with optimized defaults ### When to Use Unsloth Use **Unsloth** (`references/unsloth.md`) instead of standard TRL when: - **Limited GPU memory** - Unsloth uses ~60% less VR...

Details

Author
sickn33
Repository
sickn33/antigravity-awesome-skills
Created
4 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Listed

hugging-face-model-trainer

This skill should be used when users want to train or fine-tune language models using TRL (Transformer Reinforcement Learning) on Hugging Face Jobs infrastructure. Covers SFT, DPO, GRPO and reward modeling training methods, plus GGUF conversion for local deployment. Includes guidance on the TRL Jobs package, UV scripts with PEP 723 format, dataset preparation and validation, hardware selection, cost estimation, Trackio monitoring, Hub authentication, and model persistence. Should be invoked for tasks involving cloud GPU training, GGUF conversion, or when users mention training on Hugging Face Jobs without local GPU setup.

3 Updated today
tayyabexe
AI & Automation Featured

fine-tuning-with-trl

Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Transformers.

27,705 Updated today
davila7
AI & Automation Solid

fine-tuning-with-trl

Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Transformers.

9,182 Updated 1 months ago
Orchestra-Research
AI & Automation Solid

fine-tuning-with-trl

Fine-tune LLMs using reinforcement learning with TRL - SFT for instruction tuning, DPO for preference alignment, PPO/GRPO for reward optimization, and reward model training. Use when need RLHF, align model with preferences, or train from human feedback. Works with HuggingFace Transformers.

175,435 Updated today
NousResearch
AI & Automation Solid

hugging-face-vision-trainer

Train or fine-tune vision models on Hugging Face Jobs for detection, classification, and SAM or SAM2 segmentation.

39,350 Updated today
sickn33