fabric-pandas-perf-remediatelisted
Install: claude install-skill PatrickGallucci/fabric-skills
# Fabric Pandas Performance Troubleshooting
Diagnose and resolve pandas-related performance issues in Microsoft Fabric Spark notebooks, including memory exhaustion, slow conversions, and suboptimal pandas API on Spark usage.
## When to Use This Skill
- Notebook cells hang or timeout during pandas operations
- `toPandas()` fails with OutOfMemoryError or Java heap space errors
- `collect()` crashes the driver node
- Pandas API on Spark (`pyspark.pandas` / `ps`) runs slower than expected
- DataFrame conversion between Spark and pandas causes memory spikes
- Notebook kernel restarts unexpectedly during data processing
- Large dataset operations exhaust driver memory on Fabric capacity
- Need to choose between pandas, Spark DataFrame, or pandas API on Spark
## Prerequisites
- Microsoft Fabric workspace with Data Engineering experience
- Fabric capacity F2 or higher (F64+ recommended for large datasets)
- PySpark notebook with Spark session active
- Basic familiarity with pandas and PySpark DataFrames
## Quick Diagnosis
### Symptom-to-Solution Map
| Symptom | Likely Cause | Jump To |
|---------|-------------|---------|
| `toPandas()` OOM error | Dataset too large for driver | [toPandas Optimization](#topandas-optimization) |
| Kernel restart during pandas op | Driver memory exhausted | [Driver Memory Tuning](#driver-memory-tuning) |
| `pyspark.pandas` slower than native pandas | Spark overhead on small data | [Right-Size Your Approach](#right-size-your-approach) |
| Slow gr