jgi-lakehouselisted

Queries JGI Lakehouse (Dremio) for genomics metadata from GOLD, IMG, Mycocosm, Phytozome. Downloads genome files from JGI filesystem using IMG taxon OIDs and links JGI taxon OIDs to read files through PMO/GOLD identifiers and JAMO. Use when working with JGI data, GOLD projects, IMG annotations, or downloading genomes.
fmschulz/omics-skills · ★ 3 · Data & Documents · score 67

Install: claude install-skill fmschulz/omics-skills

# JGI Lakehouse Skill ## Quick Start **What is it?** JGI's unified data warehouse (651 tables) + filesystem access to genome files. **Two data access methods:** 1. **Lakehouse (Dremio)** → Metadata, annotations, taxonomy (no sequences) 2. **JGI Filesystem** → Actual genome files (FNA, FAA, GFF) via taxon OID **SQL Dialect:** ANSI SQL (not PostgreSQL) - Use `CAST(x AS type)` not `::` - Use `REGEXP_LIKE()` not `~` - Identifiers with dashes need double quotes: `"gold-db-2 postgresql"` ```sql -- Quick test SELECT gold_id, project_name FROM "gold-db-2 postgresql".gold.project WHERE is_public = 'Yes' LIMIT 5; ``` --- ## When to Use - Query JGI genomics metadata (GOLD, IMG, Mycocosm, Phytozome) - Find genomes and/or metagenomes by taxonomy, ecosystem, or phenotype. - Download microbial genomes with IMG taxon OIDs - Cross-reference GOLD projects with IMG annotations ## Instructions 1. Decide whether the task needs metadata, files, or read recovery. 2. Use Lakehouse SQL for metadata/annotations and the JGI filesystem or JAMO for sequence files. 3. Inspect schemas with a small `LIMIT`; remove `LIMIT` for comprehensive results. 4. Record source, table, fields, filters, and access route in every result summary. ## Quick Reference | Task | Action | |---|---| | Test Lakehouse | Query `"gold-db-2 postgresql".gold.project` | | Query IMG metadata | Use `"img-db-2 postgresql".img_core_v400.*` tables | | Query NUMG proteins | Join `faa` and `gene2pfam` on both `oid` and `gene_oid`