miles-rl-training

Featured

Provides guidance for enterprise-grade RL training using miles, a production-ready fork of slime. Use when training large MoE models with FP8/INT4, needing train-inference alignment, or requiring speculative RL for maximum throughput.

AI & Automation 27,984 stars 2901 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# miles: Enterprise-Grade RL for Large-Scale Model Training miles is a high-performance, enterprise-ready RL framework optimized for large-scale model post-training. Built as a production fork of slime, it addresses critical challenges in MoE training stability, low-precision training, and train-inference alignment. ## When to Use miles **Choose miles when you need:** - Training 1TB+ MoE models (DeepSeek V3, Qwen3-MoE) - FP8 or INT4 quantization-aware training - Bit-wise identical train-inference alignment - Speculative RL for maximum throughput - Production stability with enterprise support **Consider alternatives when:** - You want the research-grade original → use **slime** - You need flexible backend swapping → use **verl** - You want PyTorch-native abstractions → use **torchforge** ## Key Features ### Low-Precision Training - **Unified FP8**: End-to-end FP8 for both inference and training - **INT4 QAT**: 1TB models on single-machine VRAM (H200) - **Rollout Routing Replay (R3)**: Bit-wise expert alignment for MoE ### Performance Optimizations - **Speculative RL**: 25%+ rollout speedup with online SFT draft models - **Zero-Copy Weight Sync**: CUDA IPC zero-copy mapping - **Partial Rollout**: Recycle half-finished trajectories ### Train-Inference Alignment - **TIS/MIS**: Truncated/Masked Importance Sampling for off-policy correction - **Kernel-level optimization**: FlashAttention-3, DeepGEMM integration ## Installation ```bash # Recommended: Docker docker pull rad...

Details

Author
davila7
Repository
davila7/claude-code-templates
Created
11 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category