← ClaudeAtlas

fabric-sparklisted

Use for PySpark / Spark in Microsoft Fabric notebooks. Covers the no-external-HTTP constraint (land data in Files/ first), abfss:// URI format for OneLake (GUIDs not names), `notebookutils.runtime.context` for identity lookups vs `spark.conf.*` for session tuning, mssparkutils, lakehouse `enableSchemas` immutability and cross-lakehouse 3-part names, table maintenance (OPTIMIZE/VACUUM/V-Order) impact on SQL Endpoint, Delta Lake default, REST notebook upload quirks (bare-string source `400 exceptionCulprit:1`, `metadata.dependencies.lakehouse` for default-lakehouse binding, 411 on empty-body getDefinition, `/result` LRO suffix, `?updateMetadata=true` requires `.platform`), notebook-execution gotchas (`defaultLakehouse` needs id+name, never retry POST), and in-notebook auto-restart via `%%configure retriableOptions { enabled, maxAttempt }` (April 2026, for pipeline-driven runs).
wardawgmalvicious/claude-config · ★ 1 · API & Backend · score 75
Install: claude install-skill wardawgmalvicious/claude-config
# Spark / PySpark in Fabric ## Key Constraints - Fabric Spark cannot access arbitrary external HTTP/HTTPS URLs — land data in lakehouse `Files/` first (via pipeline Copy activity, OneLake API, or curl) - Use `abfss://` URI format for OneLake paths in Spark: `abfss://{workspace}@onelake.dfs.fabric.microsoft.com/{item}.Lakehouse/{path}` - Use workspace GUIDs (not names) in ABFS URIs — spaces are not allowed - `mssparkutils` for Fabric-specific notebook operations (credentials, secrets, file management) - Use Delta Lake format for all Lakehouse tables ## Runtime Context vs Spark Session Config Two different things that are often confused: | Need | API | |---|---| | Workspace / item identity (workspace ID + name, notebook ID + name, default lakehouse ID + name, userId) | `notebookutils.runtime.context["currentWorkspaceId"]` (etc.) — a dict, documented public API, works in pure-Python notebooks | | Spark session tuning (shuffle partitions, AQE, Delta settings, case sensitivity) | `spark.conf.set(...)` / `spark.conf.get(...)` | `spark.conf.get("trident.workspace.id")` also returns the workspace ID but is internal Spark conf, not documented surface, and is unavailable in pure-Python notebooks. Prefer `notebookutils.runtime.context` for identity lookups; reserve `spark.conf.*` for session tuning. ## Lakehouse Setup - **`enableSchemas` is set at lakehouse creation time only** — cannot be retrofitted. Without it the lakehouse only has the default `dbo` schema and you must recre