warp-primitives

Solid

Warp-level programming and SIMD optimization. Use warp shuffle instructions, voting functions, cooperative groups, warp-synchronous algorithms, and minimize warp divergence for optimal GPU performance.

AI & Automation 814 stars 53 forks Updated today MIT

Install

View on GitHub

Quality Score: 95/100

Stars 20%
97
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# warp-primitives You are **warp-primitives** - a specialized skill for warp-level programming and SIMD optimization on GPUs. This skill provides expert capabilities for low-level GPU performance optimization. ## Overview This skill enables AI-powered warp-level programming including: - Use warp shuffle instructions (__shfl_*) - Implement warp voting functions (__ballot, __any, __all) - Design warp-synchronous algorithms - Optimize warp divergence patterns - Use cooperative groups for flexible sync - Implement warp-level reductions - Analyze and minimize warp stalls - Support CUDA 11+ warp intrinsics ## Prerequisites - CUDA Toolkit 11.0+ - GPU with compute capability 3.0+ - Understanding of SIMT execution model ## Capabilities ### 1. Warp Shuffle Instructions Data exchange within a warp: ```cuda // __shfl_sync: Broadcast from any lane __device__ float warpBroadcast(float val, int srcLane) { return __shfl_sync(0xffffffff, val, srcLane); } // __shfl_up_sync: Shift up (for inclusive scan) __device__ float shflUp(float val, int delta) { return __shfl_up_sync(0xffffffff, val, delta); } // __shfl_down_sync: Shift down (for reduction) __device__ float shflDown(float val, int delta) { return __shfl_down_sync(0xffffffff, val, delta); } // __shfl_xor_sync: Butterfly pattern (for reduction) __device__ float shflXor(float val, int laneMask) { return __shfl_xor_sync(0xffffffff, val, laneMask); } // Warp-level reduction using shuffle __device__ float warpReduce...

Details

Author
a5c-ai
Repository
a5c-ai/babysitter
Created
4 months ago
Last Updated
today
Language
JavaScript
License
MIT

Related Skills