fabric-sparklisted
Install: claude install-skill wardawgmalvicious/claude-config
# Spark / PySpark in Fabric
## Key Constraints
- Fabric Spark cannot access arbitrary external HTTP/HTTPS URLs — land data in lakehouse `Files/` first (via pipeline Copy activity, OneLake API, or curl)
- Use `abfss://` URI format for OneLake paths in Spark: `abfss://{workspace}@onelake.dfs.fabric.microsoft.com/{item}.Lakehouse/{path}`
- Use workspace GUIDs (not names) in ABFS URIs — spaces are not allowed
- `mssparkutils` for Fabric-specific notebook operations (credentials, secrets, file management)
- Use Delta Lake format for all Lakehouse tables
## Runtime Context vs Spark Session Config
Two different things that are often confused:
| Need | API |
|---|---|
| Workspace / item identity (workspace ID + name, notebook ID + name, default lakehouse ID + name, userId) | `notebookutils.runtime.context["currentWorkspaceId"]` (etc.) — a dict, documented public API, works in pure-Python notebooks |
| Spark session tuning (shuffle partitions, AQE, Delta settings, case sensitivity) | `spark.conf.set(...)` / `spark.conf.get(...)` |
`spark.conf.get("trident.workspace.id")` also returns the workspace ID but is internal Spark conf, not documented surface, and is unavailable in pure-Python notebooks. Prefer `notebookutils.runtime.context` for identity lookups; reserve `spark.conf.*` for session tuning.
## Lakehouse Setup
- **`enableSchemas` is set at lakehouse creation time only** — cannot be retrofitted. Without it the lakehouse only has the default `dbo` schema and you must recre