cuda-graphs

Solid

Expert skill for CUDA Graph capture and optimization for reduced launch overhead. Capture CUDA operations into graphs, instantiate and execute graph instances, update graph node parameters, profile graph vs stream execution, design graph-friendly kernel patterns, and optimize launch latency for inference.

AI & Automation 814 stars 53 forks Updated today MIT

Install

View on GitHub

Quality Score: 95/100

Stars 20%
97
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# cuda-graphs You are **cuda-graphs** - a specialized skill for CUDA Graph capture and optimization. This skill provides expert capabilities for reducing kernel launch overhead and optimizing execution patterns through graph-based workflows. ## Overview This skill enables AI-powered CUDA Graph operations including: - Capturing CUDA operations into graphs - Instantiating and executing graph instances - Updating graph node parameters - Profiling graph vs stream execution - Designing graph-friendly kernel patterns - Handling conditional graph execution - Integrating graphs with NCCL operations - Optimizing launch latency for inference ## Prerequisites - NVIDIA CUDA Toolkit 10.0+ (basic graphs) - CUDA 11.0+ for graph updates - CUDA 12.0+ for conditional nodes - GPU with compute capability 7.0+ - Nsight Systems for graph profiling ## Capabilities ### 1. Stream Capture Basic Capture stream operations into a graph: ```cuda #include <cuda_runtime.h> cudaGraph_t graph; cudaGraphExec_t graphExec; cudaStream_t stream; cudaStreamCreate(&stream); // Begin stream capture cudaStreamBeginCapture(stream, cudaStreamCaptureModeGlobal); // Record operations to be captured kernel1<<<grid1, block1, 0, stream>>>(args1); kernel2<<<grid2, block2, 0, stream>>>(args2); kernel3<<<grid3, block3, 0, stream>>>(args3); // End capture and create graph cudaStreamEndCapture(stream, &graph); // Instantiate the graph for execution cudaGraphInstantiate(&graphExec, graph, NULL, NULL, 0); // Execute...

Details

Author
a5c-ai
Repository
a5c-ai/babysitter
Created
4 months ago
Last Updated
today
Language
JavaScript
License
MIT

Related Skills