ammawla

setup

Set up the ENCODE Toolkit server connection. Use when the user needs help installing, configuring, or troubleshooting the ENCODE connector.

Build comprehensive chromatin accessibility maps by aggregating ATAC-seq and DNase-seq narrowPeak data across multiple ENCODE experiments, donors, and labs. Use when the user wants to answer "where is chromatin accessible in my tissue?" by combining peak calls into a union peak set. Handles cross-lab variation, ATAC vs DNase platform differences, and ENCODE blocklist filtering.

batch-analysis

Guide for multi-experiment batch operations: QC screening, batch download, comparison, and report generation across many ENCODE experiments simultaneously. Use when users need to process 5+ experiments together, create experiment comparison tables, perform batch quality checks, or generate summary reports. Trigger on: batch analysis, multiple experiments, bulk processing, experiment comparison, batch QC, multi-sample, batch download, experiment table, summary report, collection analysis.

DevOps & Infrastructure Listed

bioinformatics-installer

Install bioinformatics tools for ENCODE data analysis. Covers CLI tools (BWA, STAR, samtools, MACS2), R/Bioconductor packages (DESeq2, Seurat, ChIPseeker), Python packages (Scanpy, deeptools), and Nextflow pipeline infrastructure. Generates conda environments, R install scripts, and Python requirements. Use when the user needs to set up a bioinformatics workstation, install tools for a specific assay, create reproducible environments, or troubleshoot dependency issues. Trigger on: install tools, set up environment, conda create, bioinformatics setup, install R packages, install Bioconductor, install pipeline tools.

cellxgene-context

Guide for integrating CellxGene Census single-cell data with ENCODE bulk experiments. Use when users need cell-type-specific expression context for ENCODE regulatory data, want to deconvolve bulk ENCODE signals, or validate regulatory elements at single-cell resolution. Trigger on: CellxGene, single-cell atlas, cell type expression, Census, cell type specificity, single-cell context, scRNA-seq atlas.

DevOps & Infrastructure Listed

cite-encode

Generate proper ENCODE citations for publications, grants, and presentations. Use when the user needs to cite ENCODE data, create bibliography entries, write acknowledgment sections, or ensure compliance with ENCODE data use policy.

clinvar-annotation

Guide for annotating ENCODE regulatory variants with ClinVar clinical significance. Use when users need to check if variants in ENCODE peaks have clinical associations, find pathogenic variants in regulatory regions, or assess variant clinical impact. Trigger on: ClinVar, clinical significance, pathogenic variant, variant classification, clinical variant, disease variant, VUS, benign, likely pathogenic.

compare-biosamples

Compare ENCODE experiments across different biosamples, tissues, or cell lines to identify tissue-specific regulatory patterns. Use when the user wants cross-tissue comparison, cell-type comparison, tissue-specific elements, differential chromatin, biosample matching, disease vs normal comparison, developmental time course, constitutive vs variable regulation, or multi-tissue data availability mapping. Handles batch effect detection, biosample hierarchy, and comparison design.

cross-reference

Cross-reference ENCODE data with PubMed, bioRxiv, ClinicalTrials.gov, Open Targets, GTEx, ClinVar, GWAS Catalog, gnomAD, Ensembl, and other scientific databases. Use when the user wants to find publications, preprints, or clinical trials related to ENCODE experiments, chain ENCODE data with other scientific MCP servers, or build translational pipelines from genomic data to clinical application.

disease-research

Use ENCODE functional genomics data for disease mechanism research. Use when the user wants to connect GWAS variants to regulatory elements, annotate disease-associated loci with functional data, identify therapeutic targets from epigenomic data, build disease regulatory models, cross-reference with clinical trials and drug databases, or conduct any disease-focused, pathology-driven, or clinical variant interpretation workflow. Covers the full pipeline from disease-tissue mapping through GWAS variant annotation, heritability enrichment, cancer epigenomics, drug target identification, and clinical trial cross-referencing. Integrates ENCODE with Open Targets, PubMed, ClinicalTrials.gov, and bioRxiv.

download-encode

Download ENCODE genomics files (BED, FASTQ, BAM, bigWig, etc.) to the user's machine. Use when the user wants to download data files from ENCODE experiments.

API & Backend Listed

ensembl-annotation

Query the Ensembl REST API for regulatory feature annotations, variant effect prediction (VEP), coordinate liftover, gene lookups, and cross-references. Use when the user needs to annotate variants with VEP (consequence, CADD, REVEL, SpliceAI), check Ensembl Regulatory Build overlap for ENCODE regions, convert coordinates between GRCh37 and GRCh38, resolve gene IDs (Ensembl ↔ symbol ↔ RefSeq), look up gene phenotype associations, or cross-reference ENCODE targets with Ensembl annotations. Also use when the user mentions Ensembl, VEP, variant effect predictor, liftover, assembly conversion, regulatory build, gene lookup, or cross-references between databases.

epigenome-profiling

Build comprehensive epigenomic profiles for tissues or cell types using ENCODE data. Use when the user wants to characterize chromatin states, assemble histone modification panels, create epigenomic landscapes, run ChromHMM segmentation, identify super-enhancers or bivalent domains, profile regulatory elements across a biosample, or understand epigenetic regulation in a specific biological context. Covers histone marks, chromatin accessibility, TF binding, transcription, DNA methylation, and 3D genome structure.

functional-screen-analysis

Analyze ENCODE functional genomics screens including CRISPR screens, MPRA (Massively Parallel Reporter Assays), and STARR-seq. Find screen data in ENCODE, process results, identify functional elements, and integrate with epigenomic annotations.

geo-connector

Search, query, and cross-reference NCBI GEO (Gene Expression Omnibus) datasets with ENCODE experiments. Use when the user wants to find GEO accessions for ENCODE experiments, search GEO for complementary datasets, download GEO metadata or series matrices, cross-reference ENCODE and GEO data, find supplementary files from GEO, or link GEO series to ENCODE experiments for provenance tracking. Also use when the user mentions GEO, GSE, GSM, GPL, GDS, series matrix, SOFT format, or needs to find expression data in GEO that complements their ENCODE analysis.

gnomad-variants

Query gnomAD (Genome Aggregation Database) for population allele frequencies, gene constraint scores, and variant annotations to interpret ENCODE regulatory variants. Use when the user needs allele frequencies for variants in ENCODE regulatory elements, wants to assess gene constraint (pLI, LOEUF) for ENCODE target genes, needs population-specific frequencies for GWAS variants overlapping cCREs, wants to filter variants by rarity before functional annotation, or is interpreting ENCODE CRISPR/MPRA results in the context of population genetics. Also use when the user mentions gnomAD, allele frequency, pLI, LOEUF, constraint, rare variants, population frequency, ExAC, or variant filtering.

API & Backend Listed

gtex-expression

Guide for integrating GTEx tissue expression data with ENCODE regulatory elements. Use when users need to check if a gene is expressed in a tissue, correlate regulatory elements with expression, or validate ENCODE findings against GTEx. Trigger on: GTEx, tissue expression, gene expression levels, expression atlas, eQTL, tissue-specific expression, TPM values.

gwas-catalog

Guide for integrating NHGRI-EBI GWAS Catalog associations with ENCODE regulatory data. Use when users need to find GWAS variants in ENCODE peaks, connect regulatory elements to disease associations, or prioritize functional variants using ENCODE annotations. Trigger on: GWAS, genome-wide association, SNP association, trait association, GWAS Catalog, disease association, risk variant, lead SNP, LD proxy.

hic-aggregation

Build comprehensive chromatin contact maps by aggregating Hi-C loop calls (BEDPE) across multiple ENCODE experiments, donors, and labs. Use when the user wants to answer "what regions are in 3D contact in my tissue?" by creating a union catalog of chromatin loops. Handles resolution-aware anchor matching, cross-lab variation, and Hi-C-specific quality metrics.

histone-aggregation

Build comprehensive histone mark maps by aggregating narrowPeak data across multiple ENCODE experiments, donors, and labs. Use when the user wants to answer "where is this histone mark present in my tissue?" by combining peak calls from multiple studies into a union peak set with confidence annotations. Handles cross-lab batch effects, broad vs narrow marks, and ENCODE blocklist filtering.

integrative-analysis

Plan and execute integrative analysis combining multiple ENCODE experiments for cross-dataset or multi-omic workflows. Use when the user wants to combine experiments, perform cross-dataset comparison, multi-omic integration, peak overlap analysis, differential binding, signal correlation, chromatin state segmentation, enhancer-gene linkage, or any analysis that requires merging or comparing data from two or more ENCODE experiments. Covers same-assay cross-sample, multi-omic same-sample, cross-organism, and perturbation integration designs. Guides compatibility checks, batch effect detection, normalization, integration strategy selection, and provenance documentation.

jaspar-motifs

Guide for using JASPAR transcription factor binding profiles with ENCODE ChIP-seq data. Use when users need to find TF binding motifs in ENCODE peaks, validate ChIP-seq targets with known motifs, or scan regulatory regions for TF binding potential. Trigger on: JASPAR, motif database, binding profile, PWM, position weight matrix, TF motif, motif enrichment, motif scanning, binding site prediction.

liftover-coordinates

Convert genomic coordinates between assembly versions (GRCh37/hg19 to GRCh38/hg38, mm9 to mm10). Guides UCSC liftOver for BED files, CrossMap for VCF/bigWig, and handles unmapped regions with provenance logging.

methylation-aggregation

Build comprehensive DNA methylation maps by aggregating WGBS (Whole Genome Bisulfite Sequencing) data across multiple ENCODE experiments, donors, and labs. Use when the user wants to answer "where is DNA methylated/unmethylated in my tissue?" by combining per-CpG methylation data into tissue-level methylation profiles. Handles coverage filtering, identifies hypomethylated regions (HMRs) and partially methylated domains (PMDs), and manages cross-lab variation.

motif-analysis

Guide for de novo and known motif enrichment analysis of ENCODE ChIP-seq and ATAC-seq peaks using HOMER and MEME Suite. Use when users need to discover TF binding motifs in peaks, validate ChIP-seq targets, or find co-binding partners. Trigger on: motif analysis, HOMER, MEME, de novo motif, motif enrichment, findMotifsGenome, AME, MEME-ChIP, known motif, TF binding motif, co-factor, motif discovery.

multi-omics-integration

Integrate multiple ENCODE data types (RNA-seq, ATAC-seq, Histone ChIP-seq, TF ChIP-seq) for a tissue/cell type to build a comprehensive regulatory landscape. Use when the user wants to answer "what are the enhancers, promoters, and regulatory elements active in my tissue, and which transcription factors control them?" by layering expression, chromatin accessibility, histone marks, and TF binding data. Follows the Mawla et al. 2023 framework for cross-assay integration of islet cell type-specific data. Handles chromatin state annotation (ChromHMM), enhancer-gene linkage, TF motif enrichment, and cell type-specific regulatory element identification. Use for ANY multi-omic analysis, enhancer discovery, regulatory network construction, or epigenomic characterization using ENCODE data.

peak-annotation

Guide for annotating ENCODE peaks with genomic features using ChIPseeker and GREAT. Use when users need to assign peaks to genes, determine genomic feature distribution (promoter, intron, intergenic), or perform gene ontology enrichment of peak-associated genes. Trigger on: peak annotation, ChIPseeker, GREAT, peak to gene, genomic feature, promoter enrichment, gene ontology, peak distribution, TSS distance, nearest gene.

pipeline-atacseq

Execute ENCODE ATAC-seq processing pipeline from FASTQ to peaks and signal tracks. Child of pipeline-guide. Provides stage-by-stage Nextflow execution with Docker containers and cloud deployment. Handles Tn5 transposase offset correction, mitochondrial read removal, nucleosome-free fragment selection, and TSS enrichment scoring. Use when users need to process ATAC-seq data following ENCODE standards. Trigger on: ATAC-seq pipeline, run ATAC-seq, process ATAC-seq, chromatin accessibility, open chromatin, Tn5 shift, TSS enrichment.

pipeline-chipseq

Execute ENCODE ChIP-seq processing pipeline from FASTQ to peaks and signal tracks. Child of pipeline-guide. Provides stage-by-stage Nextflow execution with Docker containers and cloud deployment. Use when users need to process ChIP-seq data following ENCODE standards, run peak calling with MACS2, perform IDR analysis, or generate signal tracks. Trigger on: ChIP-seq pipeline, run ChIP-seq, process ChIP-seq, MACS2 peak calling, IDR analysis, ChIP-seq FASTQ processing.

pipeline-cutandrun

Execute CUT&RUN processing pipeline from FASTQ to peaks and signal tracks. Child of pipeline-guide. Provides Nextflow execution with Docker and cloud deployment. Use when processing CUT&RUN or CUT&Tag data, an alternative to ChIP-seq with lower background. Trigger on: CUT&RUN pipeline, CUT&Tag, SEACR, Henikoff, targeted chromatin, pA-MNase, process CUT&RUN.

pipeline-dnaseseq

Execute ENCODE DNase-seq pipeline from FASTQ to hotspots and footprints. Child of pipeline-guide. Provides Nextflow execution with Docker and cloud deployment. Use when processing DNase-seq data, calling DNase hypersensitive sites, performing footprinting analysis. Trigger on: DNase-seq pipeline, DNase hypersensitive, DHS, Hotspot2, footprinting, DNase I, chromatin accessibility DNase.

pipeline-guide

Access ENCODE uniform analysis pipelines, generate user-specific Nextflow/WDL pipelines, manage compute resources, and integrate with cloud platforms. Use when the user wants to understand ENCODE pipelines, run pipelines on their own data, generate custom Nextflow workflows from ENCODE pipeline code, check compute requirements (CPU/GPU/memory), run pipelines in background, or integrate with Google Cloud, AWS, or other cloud platforms. Also use when the user asks about ENCODE pipeline outputs, processing standards, software versions, or wants to replicate ENCODE processing. Covers local execution, HPC, and cloud deployment with resource-aware scheduling. Use this skill for ANY pipeline execution, workflow generation, or compute resource management task involving ENCODE data.

pipeline-hic

Execute ENCODE Hi-C pipeline from FASTQ to contact matrices and loop calls. Child of pipeline-guide. Provides Nextflow execution with Docker and cloud deployment. Use when processing Hi-C data, generating contact matrices, calling loops or TADs. Trigger on: Hi-C pipeline, chromatin conformation, contact matrix, loop calling, TAD detection, Juicer, HiCCUPS, 3D genome.

pipeline-rnaseq

Execute ENCODE RNA-seq pipeline from FASTQ to gene quantification and signal tracks. Child of pipeline-guide. Provides Nextflow execution with Docker and cloud deployment. Use when processing RNA-seq data with STAR alignment, RSEM/Kallisto quantification, or generating expression matrices. Trigger on: RNA-seq pipeline, gene expression, STAR alignment, RSEM quantification, transcript quantification, TPM, FPKM, RNA processing, run RNA-seq.

pipeline-wgbs

Execute ENCODE Whole Genome Bisulfite Sequencing (WGBS) pipeline from FASTQ to methylation calls. Child of pipeline-guide. Provides Nextflow execution with Docker and cloud deployment. Use when processing WGBS/bisulfite-seq data, calling methylation levels, generating bedMethyl files. Trigger on: WGBS pipeline, bisulfite sequencing, methylation calling, DNA methylation pipeline, bismark, bwa-meth, bedMethyl.

publication-trust

Assess the scientific integrity and trustworthiness of publications before relying on their findings. Use this skill whenever evaluating a paper for a workflow, citing a study, building an analysis on published methods, or when a user asks about the reliability of a study. Checks for formal retractions, corrections, expressions of concern, and — critically — informal contradictions where subsequent studies failed to reproduce key findings. Integrates with PubMed, bioRxiv, and Consensus to provide a trust assessment. Use this skill for ANY publication evaluation, retraction checking, author reliability assessment, or when a user says "can I trust this paper", "is this study reliable", "has this been refuted", or "check this publication".

scientific-writing

Generate publication-ready methods sections, figure legends, supplementary tables, and data availability statements from ENCODE analysis provenance. Implements the scientific documentation standards requiring complete metadata reporting. Use when the user needs to write methods, generate figure legends, create supplementary tables, draft data availability statements, compile tool citations, or auto-generate any publication text from their ENCODE analysis. Trigger on: methods section, figure legend, supplementary table, data availability, tool citations, publication writing, manuscript, write methods, methods draft, write up, write-up, paper writing, reproducible methods.

scrna-meta-analysis

Conduct rigorous cross-study meta-analysis of scRNA-seq data from ENCODE, integrating multiple single-cell transcriptomic datasets for a tissue/cell type. Use when the user wants to answer "what cell types exist in my tissue and what genes define them?" by combining scRNA-seq data across donors, labs, and platforms. Follows the Mawla et al. 2019 framework for assessing cross-study reproducibility, TIN-based quality filtering, and detection-limit-aware interpretation. Handles batch correction (Harmony/Seurat), dropout awareness, cross-contamination artifacts, and platform-specific biases. Use this skill for ANY scRNA-seq integration task, cross-dataset comparison, cell atlas construction, or reproducibility assessment involving ENCODE single-cell data.

search-encode

Search and explore ENCODE Project genomics data. Use when the user wants to find experiments, files, or explore what data is available for specific assays, organs, cell lines, or targets.

single-cell-encode

Find and work with ENCODE single-cell genomics data including scRNA-seq and scATAC-seq. Use when the user asks about single-cell experiments, cell type resolution, clustering from ENCODE data, deconvolution of bulk signals using single-cell references, or comparing single-cell vs bulk profiles. Covers platform differences (10X Chromium, Smart-seq2, Drop-seq), quality limitations of single-cell data, multimodal integration (RNA+ATAC), and cross-study reproducibility concerns. Also use for cell type annotation, gene detection limits, dropout artifacts, and single-cell data structure in ENCODE.

track-experiments

Track ENCODE experiments locally with publications, citations, and provenance. Use when the user wants to build a collection of experiments, manage citations, compare experiments, or track data provenance.

API & Backend Listed

ucsc-browser

Query the UCSC Genome Browser REST API to retrieve regulatory tracks, DNA sequences, cCRE annotations, TF binding clusters, and track schemas for any genomic region. Use when the user wants to look up what regulatory elements exist at a genomic locus, retrieve DNA sequence under peaks, query ENCODE cCREs or TF rPeak clusters from UCSC, check what tracks are available for a genome assembly, get chromatin accessibility across cell types, or cross-reference ENCODE data with UCSC-hosted annotations. Also use when the user mentions UCSC, genome browser, cCRE lookup, SCREEN, TF binding clusters, DNA sequence retrieval, or track data extraction.

variant-annotation

Annotate genetic variants (GWAS hits, eQTLs, rare variants) with ENCODE functional data to interpret non-coding variation. Use when the user has variants of interest and wants to understand their regulatory context, identify causal variants from GWAS loci, assess variant impact on regulatory elements, perform enrichment testing of variant sets in tissue-specific annotations, or link variants to target genes through enhancer-gene maps. Handles the full post-GWAS workflow from variant set → tissue mapping → functional annotation → fine-mapping awareness → enrichment → variant-to-gene → prioritization. Use this skill for ANY variant interpretation task involving ENCODE chromatin, accessibility, TF binding, or 3D genome data.

visualization-workflow

Comprehensive guide for visualizing ENCODE data including deeptools heatmaps, IGV screenshots, UCSC track hubs, and publication-quality plots. Use when users need to create visualizations of ChIP-seq signal, peak landscapes, genome browser views, or any visual representation of ENCODE data. Trigger on: heatmap, visualization, genome browser, track hub, IGV, deeptools, signal plot, peak visualization, profile plot, publication figure, bigWig visualization.