alterlab-cellxgene

Solid

Query the CZ CELLxGENE Census (61M+ cells) programmatically via cellxgene-census and TileDB-SOMA, slicing expression by tissue, disease, or cell type and returning AnnData. Use when pulling reference single-cell RNA-seq data from the largest curated public atlas, running population-scale queries, or benchmarking your data against a reference — for analyzing your own dataset use scanpy or scvi-tools. Part of the AlterLab Academic Skills suite.

AI & Automation 27 stars 4 forks Updated today MIT

Install

View on GitHub

Quality Score: 87/100

Stars 20%
48
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# CZ CELLxGENE Census ## Overview The CZ CELLxGENE Census provides programmatic, versioned access to standardized single-cell genomics data from CZ CELLxGENE Discover. It contains **61+ million cells** (human and mouse) with standardized metadata (cell types, tissues, diseases, donors), raw gene expression matrices, pre-calculated embeddings, and integration with PyTorch, scanpy, and other analysis tools. ## When to Use This Skill Use this skill when: - Querying single-cell expression data by cell type, tissue, or disease - Exploring available single-cell datasets and metadata - Training machine learning models on single-cell data - Performing large-scale cross-dataset analyses - Integrating Census data with scanpy or other analysis frameworks - Computing statistics across millions of cells - Accessing pre-calculated embeddings or model predictions For analyzing **your own** dataset (not the reference atlas), use scanpy or scvi-tools instead. ## Installation ```bash uv pip install cellxgene-census # For PyTorch ML workflows (loaders moved out of cellxgene-census): uv pip install tiledbsoma-ml ``` ## Core Workflow 1. **Open the Census** with a context manager; pin `census_version` for reproducibility. 2. **Explore metadata first** (`get_obs` / datasets summary) to understand what's available — always filter `is_primary_data == True` to avoid duplicate cells. 3. **Estimate query size** before loading expression. < 100k cells → `get_anndata()` (in-memory); larger → `axi...

Details

Author
AlterLab-IEU
Repository
AlterLab-IEU/AlterLab-Academic-Skills
Created
2 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category