deepspeed

Solid

Expert guidance for distributed training with DeepSpeed - ZeRO optimization stages, pipeline parallelism, FP16/BF16/FP8, 1-bit Adam, sparse attention

AI & Automation 9,609 stars 724 forks Updated 1 months ago MIT

Install

View on GitHub

Quality Score: 94/100

Stars 20%

100

Recency 20%

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# Deepspeed Skill Comprehensive assistance with deepspeed development, generated from official documentation. ## When to Use This Skill This skill should be triggered when: - Working with deepspeed - Asking about deepspeed features or APIs - Implementing deepspeed solutions - Debugging deepspeed code - Learning deepspeed best practices ## Quick Reference ### Common Patterns **Pattern 1:** DeepNVMe Contents Requirements Creating DeepNVMe Handles Using DeepNVMe Handles Blocking File Write Non-Blocking File Write Parallel File Write Pinned Tensors Putting it together Acknowledgements Appendix Advanced Handle Creation Performance Tuning DeepNVMe APIs General I/O APIs GDS-specific APIs Handle Settings APIs This tutorial will show how to use DeepNVMe for data transfers between persistent storage and tensors residing in host or device memory. DeepNVMe improves the performance and efficiency of I/O operations in Deep Learning applications through powerful optimizations built on Non-Volatile Memory Express (NVMe) Solid State Drives (SSDs), Linux Asynchronous I/O (libaio), and NVIDIA Magnum IOTM GPUDirect® Storage (GDS). Requirements Ensure your environment is properly configured to use DeepNVMe. First, you need to install DeepSpeed version >= 0.15.0. Next, ensure that the DeepNVMe operators are available in the DeepSpeed installation. The async_io operator is required for any DeepNVMe functionality, while the gds operator is required only for GDS functionality. You can confirm a...

Details

Author: Orchestra-Research
Repository: Orchestra-Research/AI-Research-SKILLs
Created: 7 months ago
Last Updated: 1 months ago
Language: TeX
License: MIT

Integrates with

Hugging Face · AI

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Featured

deepspeed

Expert guidance for distributed training with DeepSpeed - ZeRO optimization stages, pipeline parallelism, FP16/BF16/FP8, 1-bit Adam, sparse attention

27,984 Updated today

davila7

AI & Automation Solid

optimizing-deep-learning-models

This skill optimizes deep learning models using various techniques. It is triggered when the user requests improvements to model performance, such as increasing accuracy, reducing training time, or minimizing resource consumption. The skill leverages advanced optimization algorithms like Adam, SGD, and learning rate scheduling. It analyzes the existing model architecture, training data, and performance metrics to identify areas for enhancement. The skill then automatically applies appropriate optimization strategies and generates optimized code. Use this skill when the user mentions "optimize deep learning model", "improve model accuracy", "reduce training time", or "optimize learning rate".

2,359 Updated today

jeremylongshore

AI & Automation Solid

edge-deployment-skill

ML model optimization and deployment on robot edge devices (Jetson, embedded)

1,313 Updated today

a5c-ai