cublas-cudnn

Solid

Expert integration with NVIDIA GPU-accelerated math libraries. Configure cuBLAS tensor core operations, generate cuBLAS GEMM calls, integrate cuDNN layers, handle algorithm selection, and support mixed-precision operations.

AI & Automation 814 stars 53 forks Updated today MIT

Install

View on GitHub

Quality Score: 95/100

Stars 20%

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# cublas-cudnn You are **cublas-cudnn** - a specialized skill for NVIDIA GPU-accelerated math library integration. This skill provides expert capabilities for using cuBLAS, cuDNN, and related libraries. ## Overview This skill enables AI-powered GPU library operations including: - Configure cuBLAS tensor core operations - Generate cuBLAS GEMM calls with optimal parameters - Integrate cuDNN convolution and normalization layers - Handle cuBLAS/cuDNN algorithm selection - Configure workspace memory requirements - Benchmark library operations vs custom kernels - Support mixed-precision operations (FP16, TF32, INT8) - Integrate with cuSPARSE for sparse operations ## Prerequisites - CUDA Toolkit 11.0+ - cuBLAS library - cuDNN 8.0+ - cuSPARSE (optional) ## Capabilities ### 1. cuBLAS GEMM Operations Matrix multiplication with cuBLAS: ```c #include <cublas_v2.h> // Initialize cuBLAS cublasHandle_t handle; cublasCreate(&handle); // Standard SGEMM: C = alpha * A * B + beta * C float alpha = 1.0f, beta = 0.0f; cublasSgemm(handle, CUBLAS_OP_N, CUBLAS_OP_N, // No transpose M, N, K, // Dimensions &alpha, d_A, M, // A matrix and leading dimension d_B, K, // B matrix and leading dimension &beta, d_C, M); // C matrix and leading dimension // Batched GEMM for multiple matrices cublasSgemmBatched(handle, CUBLAS_OP_N, CUBLAS_OP_N, M, N, K, &alpha, d_Aarray, M, ...

Details

Author: a5c-ai
Repository: a5c-ai/babysitter
Created: 4 months ago
Last Updated: today
Language: JavaScript
License: MIT

Related Skills

AI & Automation Featured

videodb

See, Understand, Act on video and audio. See- ingest from local files, URLs, RTSP/live feeds, or live record desktop; return realtime context and playable stream links. Understand- extract frames, build visual/semantic/temporal indexes, and search moments with timestamps and auto-clips. Act- transcode and normalize (codec, fps, resolution, aspect ratio), perform timeline edits (subtitles, text/image overlays, branding, audio overlays, dubbing, translation), generate media assets (image, audio, video), and create real time alerts for events from live streams or desktop capture.

196,640 Updated 2 days ago

affaan-m

AI & Automation Featured

ck

Persistent per-project memory for Claude Code. Auto-loads project context on session start, tracks sessions with git activity, and writes to native memory. Commands run deterministic Node.js scripts — behavior is consistent across model versions.

196,640 Updated 2 days ago

affaan-m

AI & Automation Featured

browser

Web browser automation with AI-optimized snapshots for claude-flow agents

55,973 Updated today

ruvnet