databricks-performance-tuning

Featured

Optimize Databricks cluster and query performance. Use when jobs are running slowly, optimizing Spark configurations, or improving Delta Lake query performance. Trigger with phrases like "databricks performance", "spark tuning", "databricks slow", "optimize databricks", "cluster performance".

AI & Automation 2,359 stars 334 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Databricks Performance Tuning ## Overview Optimize Databricks cluster sizing, Spark configuration, and Delta Lake query performance. Covers workload-specific Spark configs, Adaptive Query Execution (AQE), Liquid Clustering, Z-ordering, OPTIMIZE/VACUUM maintenance, query plan analysis, and caching strategies. ## Prerequisites - Access to cluster configuration (admin or cluster owner) - Understanding of workload type (ETL batch, ML training, streaming, interactive) - Query history access for identifying slow queries ## Instructions ### Step 1: Cluster Sizing by Workload | Workload | Instance Family | Why | Workers | |----------|----------------|-----|---------| | ETL Batch | Compute-optimized (c5/c6) | CPU-heavy transforms | 2-8, autoscale | | ML Training | Memory-optimized (r5/r6) | Large model fits | 4-16, fixed | | Streaming | Compute-optimized (c5) | Sustained throughput | 2-4, fixed | | Interactive / Ad-hoc | General-purpose (m5) | Balanced | Single node or 1-4 | | Heavy shuffle / spill | Storage-optimized (i3) | Fast local NVMe | 4-8 | ```python def recommend_cluster(data_size_gb: float, workload: str) -> dict: """Recommend cluster config based on data size and workload type.""" configs = { "etl_batch": {"node": "c5.2xlarge", "memory_gb": 16, "multiplier": 1.5}, "ml_training": {"node": "r5.2xlarge", "memory_gb": 64, "multiplier": 2.0}, "streaming": {"node": "c5.xlarge", "memory_gb": 8, "multiplier": 1.0}, "interactive": {"no...

Details

Author
jeremylongshore
Repository
jeremylongshore/claude-code-plugins-plus-skills
Created
8 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category