dask

Featured

Distributed computing for larger-than-RAM pandas/NumPy workflows. Use when you need to scale existing pandas/NumPy code beyond memory or across clusters. Best for parallel file processing, distributed ML, integration with existing pandas code. For out-of-core analytics on single machine use vaex; for in-memory speed use polars.

AI & Automation 31,883 stars 3168 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%

100

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# Dask ## Overview Dask is a Python library for parallel and distributed computing that enables three critical capabilities: - **Larger-than-memory execution** on single machines for data exceeding available RAM - **Parallel processing** for improved computational speed across multiple cores - **Distributed computation** supporting terabyte-scale datasets across multiple machines Dask scales from laptops (processing ~100 GiB) to clusters (processing ~100 TiB) while maintaining familiar Python APIs. **Current upstream:** dask **2026.3.0** (PyPI, March 2026). Docs: [docs.dask.org](https://docs.dask.org/en/stable/). Since **2025.1.0**, the expression-based DataFrame API with query planning is the only implementation — do not install `dask-expr` separately or set `dataframe.query-planning: False`. ## Quick Start ### Installation ```bash uv pip install "dask>=2025.1" ``` For a typical pandas/NumPy workflow with the distributed scheduler and dashboard: ```bash uv pip install "dask[complete]" ``` Remote object storage (S3, GCS, Azure): ```bash uv pip install s3fs # s3:// paths uv pip install gcsfs # gs:// paths ``` Requires **Python 3.10+** (3.9 support dropped in 2024.12). DataFrame I/O requires **PyArrow 16+** (as of dask 2026.1.2). ## When to Use This Skill This skill should be used when: - Process datasets that exceed available RAM - Scale pandas or NumPy operations to larger datasets - Parallelize computations for performance improvements - Process multiple f...

Details

Author: K-Dense-AI
Repository: K-Dense-AI/scientific-agent-skills
Created: 9 months ago
Last Updated: today
Language: Python
License: MIT

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Listed

dask

13 Updated today

tassiovale

AI & Automation Listed

dask

2 Updated 1 weeks ago

MarieLynneBlock

AI & Automation Featured

dask

Parallel/distributed computing. Scale pandas/NumPy beyond memory, parallel DataFrames/Arrays, multi-file processing, task graphs, for larger-than-RAM datasets and parallel workflows.

2,489 Updated 5 days ago

foryourhealth111-pixel