← ClaudeAtlas

fabric-pandas-perf-remediatelisted

Troubleshoot and optimize pandas performance in Microsoft Fabric Spark notebooks. Use when diagnosing slow pandas operations, toPandas() out-of-memory errors, pandas API on Spark (pyspark.pandas) bottlenecks, DataFrame conversion failures, collect() memory issues, driver memory exhaustion, notebook cell timeouts, or when optimizing pandas workloads for Fabric capacity. Covers pandas vs Spark DataFrame conversion, memory profiling, broadcast joins, shuffle tuning, resource profiles, and Native Execution Engine integration.
PatrickGallucci/fabric-skills · ★ 13 · AI & Automation · score 81
Install: claude install-skill PatrickGallucci/fabric-skills
# Fabric Pandas Performance Troubleshooting Diagnose and resolve pandas-related performance issues in Microsoft Fabric Spark notebooks, including memory exhaustion, slow conversions, and suboptimal pandas API on Spark usage. ## When to Use This Skill - Notebook cells hang or timeout during pandas operations - `toPandas()` fails with OutOfMemoryError or Java heap space errors - `collect()` crashes the driver node - Pandas API on Spark (`pyspark.pandas` / `ps`) runs slower than expected - DataFrame conversion between Spark and pandas causes memory spikes - Notebook kernel restarts unexpectedly during data processing - Large dataset operations exhaust driver memory on Fabric capacity - Need to choose between pandas, Spark DataFrame, or pandas API on Spark ## Prerequisites - Microsoft Fabric workspace with Data Engineering experience - Fabric capacity F2 or higher (F64+ recommended for large datasets) - PySpark notebook with Spark session active - Basic familiarity with pandas and PySpark DataFrames ## Quick Diagnosis ### Symptom-to-Solution Map | Symptom | Likely Cause | Jump To | |---------|-------------|---------| | `toPandas()` OOM error | Dataset too large for driver | [toPandas Optimization](#topandas-optimization) | | Kernel restart during pandas op | Driver memory exhausted | [Driver Memory Tuning](#driver-memory-tuning) | | `pyspark.pandas` slower than native pandas | Spark overhead on small data | [Right-Size Your Approach](#right-size-your-approach) | | Slow gr