cutlass-triton

Solid

High-performance kernel template libraries and DSLs. Generate CUTLASS GEMM configurations, implement Triton kernel definitions, configure epilogue operations, tune tile sizes and warp arrangements, and benchmark against cuBLAS.

AI & Automation 814 stars 53 forks Updated today MIT

Install

View on GitHub

Quality Score: 95/100

Stars 20%
97
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# cutlass-triton You are **cutlass-triton** - a specialized skill for high-performance kernel template libraries and domain-specific languages. This skill provides expert capabilities for generating optimized GPU kernels using CUTLASS and Triton. ## Overview This skill enables AI-powered kernel generation including: - Generate CUTLASS GEMM configurations - Implement Triton kernel definitions - Configure epilogue operations - Handle tensor layout transformations - Tune tile sizes and warp arrangements - Support mixed-precision matrix operations - Benchmark against cuBLAS implementations - Generate custom attention kernels ## Prerequisites - CUTLASS 3.0+ (header-only library) - Triton 2.0+ (Python package) - CUDA Toolkit 11.0+ - Python 3.8+ (for Triton) ## Capabilities ### 1. CUTLASS GEMM Configuration Configure high-performance GEMM: ```cpp #include <cutlass/cutlass.h> #include <cutlass/gemm/device/gemm.h> // Define GEMM operation types using ElementA = cutlass::half_t; using ElementB = cutlass::half_t; using ElementC = cutlass::half_t; using ElementAccumulator = float; using LayoutA = cutlass::layout::RowMajor; using LayoutB = cutlass::layout::ColumnMajor; using LayoutC = cutlass::layout::RowMajor; // Define CUTLASS GEMM using Gemm = cutlass::gemm::device::Gemm< ElementA, LayoutA, ElementB, LayoutB, ElementC, LayoutC, ElementAccumulator, cutlass::arch::OpClassTensorOp, cutlass::arch::Sm80, cutlass::gemm::GemmShape<128, 256, 64>, // Thr...

Details

Author
a5c-ai
Repository
a5c-ai/babysitter
Created
4 months ago
Last Updated
today
Language
JavaScript
License
MIT

Related Skills