alterlab-pytdc

Solid

Loads Therapeutics Data Commons (TDC, PyTDC) AI-ready drug-discovery datasets and benchmarks — ADME, toxicity, drug-target interaction (DTI), scaffold splits, and molecular oracles for therapeutic ML and pharmacological prediction. Use when fetching a standardized benchmark dataset, applying scaffold or cold-split evaluation, or sourcing labeled molecules for ADMET, toxicity, or DTI modeling. Sources data, splits, and oracles only — defer molecular featurization (ECFP/fingerprints), model training, and transformers to a molecular-ML skill (e.g. deepchem). Part of the AlterLab Academic Skills suite.

AI & Automation 27 stars 4 forks Updated today MIT

Install

View on GitHub

Quality Score: 87/100

Stars 20%
48
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# PyTDC (Therapeutics Data Commons) ## Overview PyTDC is an open-science platform providing AI-ready datasets and benchmarks for drug discovery and development. Access curated datasets spanning the entire therapeutics pipeline with standardized evaluation metrics and meaningful data splits, organized into three categories: single-instance prediction (molecular/protein properties), multi-instance prediction (drug-target interactions, DDI), and generation (molecule generation, retrosynthesis). ## When to Use This Skill This skill should be used when: - Working with drug discovery or therapeutic ML datasets - Benchmarking machine learning models on standardized pharmaceutical tasks - Predicting molecular properties (ADME, toxicity, bioactivity) - Predicting drug-target or drug-drug interactions - Generating novel molecules with desired properties - Accessing curated datasets with proper train/test splits (scaffold, cold-split) - Using molecular oracles for property optimization ## Installation & Setup Install PyTDC using pip: ```bash uv pip install PyTDC ``` To upgrade to the latest version: ```bash uv pip install PyTDC --upgrade ``` Core dependencies (automatically installed): - numpy, pandas, tqdm, seaborn, scikit_learn, fuzzywuzzy Additional packages are installed automatically as needed for specific features. ## Quick Start The basic pattern for accessing any TDC dataset follows this structure: ```python from tdc.<problem> import <Task> data = <Task>(name='<Dat...

Details

Author
AlterLab-IEU
Repository
AlterLab-IEU/AlterLab-Academic-Skills
Created
2 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category