arboreto-grn-inferencelisted
Install: claude install-skill jaechang-hits/SciAgent-Skills
# Arboreto GRN Inference
## Overview
Arboreto infers gene regulatory networks (GRNs) from gene expression data using parallelized tree-based regression. For each target gene, it trains a regression model with all other genes (or a specified TF list) as features and emits TF-target-importance triplets. It provides two interchangeable algorithms -- GRNBoost2 (gradient boosting, fast) and GENIE3 (Random Forest, classic) -- sharing identical input/output formats. Computation is Dask-parallelized, scaling from laptop cores to HPC clusters.
## When to Use
- Inferring transcription factor-to-target gene regulatory relationships from bulk RNA-seq expression data
- Building gene regulatory networks from single-cell RNA-seq count matrices (cells as rows, genes as columns)
- Generating the adjacency matrix (Step 1) of the pySCENIC regulatory analysis pipeline
- Comparing regulatory network structure across experimental conditions (e.g., control vs treatment)
- Producing consensus regulatory networks by running inference across multiple random seeds
- Validating GRN results by comparing GRNBoost2 and GENIE3 outputs on the same dataset
- For downstream regulon identification and activity scoring, use arboreto output with pySCENIC
- For single-cell preprocessing (QC, normalization, clustering) before GRN inference, use **scanpy-scrna-seq**
## Prerequisites
- **Python packages**: `arboreto`, `pandas`, `numpy`, `dask`, `distributed`, `scikit-learn`, `scipy`
- **Data requirements**: Gen