deepspeed
SolidExpert guidance for distributed training with DeepSpeed - ZeRO optimization stages, pipeline parallelism, FP16/BF16/FP8, 1-bit Adam, sparse attention
Install
Quality Score: 94/100
Skill Content
Details
- Author
- Orchestra-Research
- Repository
- Orchestra-Research/AI-Research-SKILLs
- Created
- 7 months ago
- Last Updated
- 1 months ago
- Language
- TeX
- License
- MIT
Integrates with
Similar Skills
Semantically similar based on skill content — not just same category
deepspeed
Expert guidance for distributed training with DeepSpeed - ZeRO optimization stages, pipeline parallelism, FP16/BF16/FP8, 1-bit Adam, sparse attention
optimizing-deep-learning-models
This skill optimizes deep learning models using various techniques. It is triggered when the user requests improvements to model performance, such as increasing accuracy, reducing training time, or minimizing resource consumption. The skill leverages advanced optimization algorithms like Adam, SGD, and learning rate scheduling. It analyzes the existing model architecture, training data, and performance metrics to identify areas for enhancement. The skill then automatically applies appropriate optimization strategies and generates optimized code. Use this skill when the user mentions "optimize deep learning model", "improve model accuracy", "reduce training time", or "optimize learning rate".
edge-deployment-skill
ML model optimization and deployment on robot edge devices (Jetson, embedded)