← ClaudeAtlas

bigdata-machine-learninglisted

Machine learning toolkit for big data teams. Includes scikit-learn, PyTorch Lightning, Transformers, SHAP for model training, deployment, and interpretation. Use when building ML pipelines, training models, or explaining predictions.
MARUCIE/openclaw-foundry · ★ 1 · AI & Automation · score 67
Install: claude install-skill MARUCIE/openclaw-foundry
# Big Data Machine Learning Toolkit ## Overview 大数据团队机器学习工具集,从传统ML到深度学习全覆盖。 ## Quick Reference | 工具 | 场景 | 规模 | |------|------|------| | **scikit-learn** | 传统ML | 中等数据 | | **PyTorch Lightning** | 深度学习 | GPU训练 | | **Transformers** | NLP/LLM | 预训练模型 | | **SHAP** | 模型解释 | 可解释AI | ## 选择指南 ``` 任务类型: ├── 分类/回归 → scikit-learn ├── 时间序列 → scikit-learn + statsmodels ├── 文本处理 → Transformers ├── 图像处理 → PyTorch Lightning ├── 强化学习 → stable-baselines3 └── 模型解释 → SHAP ``` ## 子Skills - `scikit-learn/` - 传统机器学习 - `pytorch-lightning/` - 深度学习框架 - `transformers/` - NLP预训练模型 - `shap/` - 模型可解释性 - `stable-baselines3/` - 强化学习 ## 常用模式 ### 标准ML Pipeline (scikit-learn) ```python from sklearn.pipeline import Pipeline from sklearn.preprocessing import StandardScaler from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import cross_val_score pipeline = Pipeline([ ('scaler', StandardScaler()), ('classifier', RandomForestClassifier()) ]) scores = cross_val_score(pipeline, X, y, cv=5) print(f"CV Score: {scores.mean():.3f} ± {scores.std():.3f}") ``` ### 深度学习训练 (PyTorch Lightning) ```python import pytorch_lightning as pl from pytorch_lightning.callbacks import EarlyStopping trainer = pl.Trainer( max_epochs=100, accelerator="gpu", callbacks=[EarlyStopping(monitor="val_loss")] ) trainer.fit(model, train_loader, val_loader) ``` ### NLP任务 (Transformers) ```python from transformers import pipeline # 文本分类 classifier = pipeline("text-classification", mode