dvc-dataset-versioning
SolidDataset versioning skill using DVC for tracking data changes, managing data pipelines, and ensuring reproducibility.
Data & Documents 814 stars
53 forks Updated today MIT
Install
Quality Score: 95/100
Stars 20%
Recency 20%
Frontmatter 20%
Documentation 15%
Issue Health 10%
License 10%
Description 5%
Skill Content
# dvc-dataset-versioning
## Overview
Dataset versioning skill using DVC (Data Version Control) for tracking data changes, managing data pipelines, and ensuring reproducibility in ML workflows.
## Capabilities
- Dataset version tracking
- Data pipeline definition and execution
- Remote storage management (S3, GCS, Azure, etc.)
- Reproducibility enforcement
- Data lineage tracking
- Experiment comparison with data versions
- Cache management for large datasets
## Target Processes
- Data Collection and Validation Pipeline
- ML Model Retraining Pipeline
- Feature Store Implementation
## Tools and Libraries
- DVC
- Git
- Remote storage SDKs (boto3, google-cloud-storage, etc.)
## Input Schema
```json
{
"type": "object",
"required": ["action"],
"properties": {
"action": {
"type": "string",
"enum": ["init", "add", "push", "pull", "diff", "checkout", "run", "repro"],
"description": "DVC action to perform"
},
"paths": {
"type": "array",
"items": { "type": "string" },
"description": "File or directory paths to track"
},
"remote": {
"type": "string",
"description": "Remote storage name"
},
"revision": {
"type": "string",
"description": "Git revision for checkout/diff"
},
"pipeline": {
"type": "object",
"description": "Pipeline stage definition for run action"
}
}
}
```
## Output Schema
```json
{
"type": "object",
"required": ["status", "action"],
"prop...
Details
- Author
- a5c-ai
- Repository
- a5c-ai/babysitter
- Created
- 4 months ago
- Last Updated
- today
- Language
- JavaScript
- License
- MIT
Related Skills
Data & Documents Featured
burpsuite-project-parser
Searches and explores Burp Suite project files (.burp) from the command line. Use when searching response headers or bodies with regex patterns, extracting security audit findings, dumping proxy history or site map data, or analyzing HTTP traffic captured in a Burp project.
38,979 Updated today
sickn33 Data & Documents Featured
data-storytelling
Transform raw data into compelling narratives that drive decisions and inspire action.
38,979 Updated today
sickn33 Data & Documents Featured
documentation
Documentation generation workflow covering API docs, architecture docs, README files, code comments, and technical writing.
38,979 Updated today
sickn33