miles-rl-training

Solid

Provides guidance for enterprise-grade RL training using miles, a production-ready fork of slime. Use when training large MoE models with FP8/INT4, needing train-inference alignment, or requiring speculative RL for maximum throughput.

AI & Automation 9,609 stars 724 forks Updated 1 months ago MIT

Install

View on GitHub

Quality Score: 94/100

Stars 20%

100

Recency 20%

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# miles: Enterprise-Grade RL for Large-Scale Model Training miles is a high-performance, enterprise-ready RL framework optimized for large-scale model post-training. Built as a production fork of slime, it addresses critical challenges in MoE training stability, low-precision training, and train-inference alignment. ## When to Use miles **Choose miles when you need:** - Training 1TB+ MoE models (DeepSeek V3, Qwen3-MoE) - FP8 or INT4 quantization-aware training - Bit-wise identical train-inference alignment - Speculative RL for maximum throughput - Production stability with enterprise support **Consider alternatives when:** - You want the research-grade original → use **slime** - You need flexible backend swapping → use **verl** - You want PyTorch-native abstractions → use **torchforge** ## Key Features ### Low-Precision Training - **Unified FP8**: End-to-end FP8 for both inference and training - **INT4 QAT**: 1TB models on single-machine VRAM (H200) - **Rollout Routing Replay (R3)**: Bit-wise expert alignment for MoE ### Performance Optimizations - **Speculative RL**: 25%+ rollout speedup with online SFT draft models - **Zero-Copy Weight Sync**: CUDA IPC zero-copy mapping - **Partial Rollout**: Recycle half-finished trajectories ### Train-Inference Alignment - **TIS/MIS**: Truncated/Masked Importance Sampling for off-policy correction - **Kernel-level optimization**: FlashAttention-3, DeepGEMM integration ## Installation ```bash # Recommended: Docker docker pull rad...

Details

Author: Orchestra-Research
Repository: Orchestra-Research/AI-Research-SKILLs
Created: 7 months ago
Last Updated: 1 months ago
Language: TeX
License: MIT

Integrates with

Hugging Face · AI

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Featured

miles-rl-training

27,984 Updated today

davila7

AI & Automation Solid

slime-rl-training

Provides guidance for LLM post-training with RL using slime, a Megatron+SGLang framework. Use when training GLM models, implementing custom data generation workflows, or needing tight Megatron-LM integration for RL scaling.

191,515 Updated today

NousResearch

AI & Automation Solid

slime-rl-training

9,609 Updated 1 months ago

Orchestra-Research