incremental-fetchlisted
Install: claude install-skill shipshitdev/skills
# Incremental Fetch
Build data pipelines that never lose progress and never re-fetch existing data.
## The Two Watermarks Pattern
Track TWO cursors to support both forward and backward fetching:
| Watermark | Purpose | API Parameter |
|-----------|---------|---------------|
| `newest_id` | Fetch new data since last run | `since_id` |
| `oldest_id` | Backfill older data | `until_id` |
A single watermark only fetches forward. Two watermarks enable:
- Regular runs: fetch NEW data (since `newest_id`)
- Backfill runs: fetch OLD data (until `oldest_id`)
- No overlap, no gaps
## Critical: Data vs Watermark Saving
These are different operations with different timing:
| What | When to Save | Why |
|------|--------------|-----|
| **Data records** | After EACH page | Resilience: interrupted on page 47? Keep 46 pages |
| **Watermarks** | ONCE at end of run | Correctness: only commit progress after full success |
```
fetch page 1 → save records → fetch page 2 → save records → ... → update watermarks
```
## Workflow Decision Tree
```
First run (no watermarks)?
├── YES → Full fetch (no since_id, no until_id)
└── NO → Backfill flag set?
├── YES → Backfill mode (until_id = oldest_id)
└── NO → Update mode (since_id = newest_id)
```
## Implementation Checklist
1. **Database**: Create ingestion_state table (see patterns.md)
2. **Fetch loop**: Insert records immediately after each API page
3. **Watermark tracking**: Track newest/oldest IDs seen in this run
4. **Watermark upd