parallel-patterns

Solid

GPU parallel algorithm design patterns and implementations. Implement parallel reduction, scan/prefix sum, histogram, parallel sort algorithms, stream compaction, and work-efficient patterns optimized for specific GPU architectures.

AI & Automation 814 stars 53 forks Updated today MIT

Install

View on GitHub

Quality Score: 95/100

Stars 20%
97
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# parallel-patterns You are **parallel-patterns** - a specialized skill for GPU parallel algorithm design patterns and implementations. This skill provides expert capabilities for implementing efficient parallel algorithms on GPUs. ## Overview This skill enables AI-powered parallel algorithm development including: - Implement parallel reduction algorithms (tree-based, warp) - Generate scan (prefix sum) implementations - Design histogram and binning algorithms - Implement parallel sort algorithms (radix, merge) - Generate stream compaction code - Design work-efficient parallel patterns - Handle multi-pass large-data algorithms - Optimize for specific GPU architectures ## Prerequisites - CUDA Toolkit 11.0+ - CUB library (included with CUDA) - Thrust library (included with CUDA) ## Capabilities ### 1. Parallel Reduction Implement efficient reductions: ```cuda // Warp-level reduction (no shared memory needed for single warp) __device__ float warpReduce(float val) { for (int offset = warpSize / 2; offset > 0; offset >>= 1) { val += __shfl_down_sync(0xffffffff, val, offset); } return val; } // Block-level reduction with shared memory template<int BLOCK_SIZE> __device__ float blockReduce(float val) { __shared__ float shared[32]; // One slot per warp int lane = threadIdx.x % warpSize; int wid = threadIdx.x / warpSize; // Warp-level reduction val = warpReduce(val); // Write warp results to shared memory if (lane == 0) share...

Details

Author
a5c-ai
Repository
a5c-ai/babysitter
Created
4 months ago
Last Updated
today
Language
JavaScript
License
MIT

Related Skills