miles-rl-training

Featured

Provides guidance for enterprise-grade RL training using miles, a production-ready fork of slime. Use when training large MoE models with FP8/INT4, needing train-inference alignment, or requiring speculative RL for maximum throughput.

AI & Automation 27,984 stars 2901 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%

100

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# miles: Enterprise-Grade RL for Large-Scale Model Training miles is a high-performance, enterprise-ready RL framework optimized for large-scale model post-training. Built as a production fork of slime, it addresses critical challenges in MoE training stability, low-precision training, and train-inference alignment. ## When to Use miles **Choose miles when you need:** - Training 1TB+ MoE models (DeepSeek V3, Qwen3-MoE) - FP8 or INT4 quantization-aware training - Bit-wise identical train-inference alignment - Speculative RL for maximum throughput - Production stability with enterprise support **Consider alternatives when:** - You want the research-grade original → use **slime** - You need flexible backend swapping → use **verl** - You want PyTorch-native abstractions → use **torchforge** ## Key Features ### Low-Precision Training - **Unified FP8**: End-to-end FP8 for both inference and training - **INT4 QAT**: 1TB models on single-machine VRAM (H200) - **Rollout Routing Replay (R3)**: Bit-wise expert alignment for MoE ### Performance Optimizations - **Speculative RL**: 25%+ rollout speedup with online SFT draft models - **Zero-Copy Weight Sync**: CUDA IPC zero-copy mapping - **Partial Rollout**: Recycle half-finished trajectories ### Train-Inference Alignment - **TIS/MIS**: Truncated/Masked Importance Sampling for off-policy correction - **Kernel-level optimization**: FlashAttention-3, DeepGEMM integration ## Installation ```bash # Recommended: Docker docker pull rad...

Details

Author: davila7
Repository: davila7/claude-code-templates
Created: 11 months ago
Last Updated: today
Language: Python
License: MIT

Integrates with

Anthropic · AI

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Solid