dataset-transformation

Solid

Generates a Jupyter notebook that transforms datasets between ML schemas for model training or evaluation. Use when the user says "transform", "convert", "reformat", "change the format", or when a dataset's schema needs to change to match the target format — always use this skill for format changes rather than writing inline transformation code. Supports OpenAI chat, SageMaker SFT/DPO/RLVR, HuggingFace preference, Bedrock Nova, VERL, and custom JSONL formats from local files or S3.

Data & Documents 765 stars 108 forks Updated 2 days ago Apache-2.0

Install

View on GitHub

Quality Score: 95/100

Stars 20%
96
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Dataset Transformation Agent Transforms a data set provided by the user into their desired format. All transformation code is delivered as a Jupyter notebook. ## When to Use - User needs to generate code for transforming datasets for SageMaker model training or model evaluation. - A dataset requires processing, cleaning, or formatting before training or evaluation. - Workflow requires a formal review and approval cycle before execution. ## Principles 1. **One thing at a time.** Each response advances exactly one decision. Never combine multiple questions or recommendations in a single turn. 2. **Confirm before proceeding.** Wait for the user to agree before moving to the next step. You are a guide, not a runaway train. 3. **Don't read files until you need them.** Only read reference files when you've reached the workflow step that requires them and the user has confirmed the direction. Never read ahead. 4. **No narration.** Don't explain what you're about to do or what you just did. Share outcomes and ask questions. Keep responses short and focused. 5. **No repetition.** If you said something before a tool call, don't repeat it after. Only share new information. 6. **Do not deviate from the Workflow.** The steps listed in the workflow should be followed exactly as described. Progress from Step 1 to Step 10 to complete the task. Do not deviate from the workflow! 7. **Always end with a question.** Whenever you pause for user input, acknowledgment, or feedback, your respo...

Details

Author
awslabs
Repository
awslabs/agent-plugins
Created
3 months ago
Last Updated
2 days ago
Language
Shell
License
Apache-2.0

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category

Data & Documents Listed

transforming-data

Transform raw data into analytical assets using ETL/ELT patterns, SQL (dbt), Python (pandas/polars/PySpark), and orchestration (Airflow). Use when building data pipelines, implementing incremental models, migrating from pandas to polars, or orchestrating multi-step transformations with testing and quality checks.

368 Updated 5 months ago
ancoleman
Data & Documents Listed

dataset-curator

Use this skill when designing, cleaning, deduplicating, or documenting datasets for model training and evaluation including schema design, class imbalance handling, and train/val/test splits. Not for running model training or hyperparameter tuning. Not for real-time data pipeline engineering.

15 Updated 2 days ago
NickCrew
Data & Documents Solid

dataset-evaluation

Validates dataset formatting and quality for SageMaker model fine-tuning (SFT, DPO, or RLVR). Use when the user says "is my dataset okay", "evaluate my data", "check my training data", "I have my own data", or before starting any fine-tuning job. Detects file format, checks schema compliance against the selected model and technique, and reports whether the data is ready for training or evaluation.

765 Updated 2 days ago
awslabs
Data & Documents Listed

datarobot-data-preparation

Tools and guidance for data upload, dataset management, data validation, and preparing data for DataRobot projects. Use when uploading datasets, managing data, or validating data for DataRobot.

16 Updated 2 days ago
datarobot-oss
AI & Automation Solid

dataset-loader-creator

Create dataset loader creator operations. Auto-activating skill for ML Training. Triggers on: dataset loader creator, dataset loader creator Part of the ML Training skill category. Use when working with dataset loader creator functionality. Trigger with phrases like "dataset loader creator", "dataset creator", "dataset".

2,274 Updated today
jeremylongshore