debug-cuda-crashlisted
Install: claude install-skill aiskillstore/marketplace
# Tutorial: Debugging CUDA Crashes with API Logging
This tutorial shows you how to debug CUDA crashes and errors in FlashInfer using the `@flashinfer_api` logging decorator.
## Goal
When your code crashes with CUDA errors (illegal memory access, out-of-bounds, NaN/Inf), use API logging to:
- Capture input tensors BEFORE the crash occurs
- Understand what data caused the problem
- Track tensor shapes, dtypes, and values through your pipeline
- Detect numerical issues (NaN, Inf, wrong shapes)
## Why Use API Logging?
**Problem**: CUDA errors often crash the program, leaving no debugging information.
**Solution**: FlashInfer's `@flashinfer_api` decorator logs inputs BEFORE execution, so you can see what caused the crash even after the program terminates.
## Step 1: Enable API Logging
### Basic Logging (Function Names Only)
```bash
export FLASHINFER_LOGLEVEL=1 # Log function names
export FLASHINFER_LOGDEST=stdout # Log to console
python my_script.py
```
Output:
```
[2025-12-18 10:30:45] FlashInfer API Call: batch_decode_with_padded_kv_cache
```
### Detailed Logging (Inputs/Outputs with Metadata)
```bash
export FLASHINFER_LOGLEVEL=3 # Log inputs/outputs with metadata
export FLASHINFER_LOGDEST=debug.log # Save to file
python my_script.py
```
Output in `debug.log`:
```
================================================================================
[2025-12-18 10:30:45] FlashInfer API Logging - System Information
=======================================