slime-rl-training

Featured

Provides guidance for LLM post-training with RL using slime, a Megatron+SGLang framework. Use when training GLM models, implementing custom data generation workflows, or needing tight Megatron-LM integration for RL scaling.

AI & Automation 27,984 stars 2901 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%

100

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# slime: LLM Post-Training Framework for RL Scaling slime is an LLM post-training framework from Tsinghua's THUDM team, powering GLM-4.5, GLM-4.6, and GLM-4.7. It connects Megatron-LM for training with SGLang for high-throughput rollout generation. ## When to Use slime **Choose slime when you need:** - Megatron-LM native training with SGLang inference - Custom data generation workflows with flexible data buffers - Training GLM, Qwen3, DeepSeek V3, or Llama 3 models - Research-grade framework with production backing (Z.ai) **Consider alternatives when:** - You need enterprise-grade stability features → use **miles** - You want flexible backend swapping → use **verl** - You need PyTorch-native abstractions → use **torchforge** ## Key Features - **Training**: Megatron-LM with full parallelism support (TP, PP, DP, SP) - **Rollout**: SGLang-based high-throughput generation with router - **Data Buffer**: Flexible prompt management and sample storage - **Models**: GLM-4.x, Qwen3, DeepSeek V3/R1, Llama 3 ## Architecture Overview ``` ┌─────────────────────────────────────────────────────────┐ │ Data Buffer │ │ - Prompt initialization and management │ │ - Custom data generation and filtering │ │ - Rollout sample storage │ └─────────────┬───────────────────────────┬───────────────┘ │ │ ┌─────────────▼───────────┐ ┌─────────────▼────...

Details

Author: davila7
Repository: davila7/claude-code-templates
Created: 11 months ago
Last Updated: today
Language: Python
License: MIT

Integrates with

Anthropic · AI

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Solid