simpo-training
SolidSimple Preference Optimization for LLM alignment. Reference-free alternative to DPO with better performance (+6.4 points on AlpacaEval 2.0). No reference model needed, more efficient than DPO. Use for preference alignment when want simpler, faster training than DPO/PPO.
Install
Quality Score: 93/100
Skill Content
Details
- Author
- NousResearch
- Repository
- NousResearch/hermes-agent
- Created
- 10 months ago
- Last Updated
- today
- Language
- Python
- License
- MIT
Integrates with
Similar Skills
Semantically similar based on skill content — not just same category
simpo-training
Simple Preference Optimization for LLM alignment. Reference-free alternative to DPO with better performance (+6.4 points on AlpacaEval 2.0). No reference model needed, more efficient than DPO. Use for preference alignment when want simpler, faster training than DPO/PPO.
simpo-training
Simple Preference Optimization for LLM alignment. Reference-free alternative to DPO with better performance (+6.4 points on AlpacaEval 2.0). No reference model needed, more efficient than DPO. Use for preference alignment when want simpler, faster training than DPO/PPO.
nlp-alignment
Best practices for LLM alignment techniques including RLHF, DPO, and instruction tuning. Use when working on alignment or safety.