data-governancelisted
Install: claude install-skill Methasit-Pun/data_engineer_claude_skills
# Data Governance for Data Pipelines
## Why Governance Matters Now, Not Later
Governance work done retroactively costs 10x what it costs upfront. When an audit arrives or a breach happens, you need to answer three questions fast: *What data do we have? Where does it come from? Who can see it?* If you can't answer these from a system, you'll answer them from a spreadsheet — under pressure, with incomplete information.
---
## PII Classification
### Sensitivity tiers
Classify every field that touches personal data before it enters the warehouse.
| Tier | Examples | Default access |
|---|---|---|
| **Public** | Country, product tier, aggregated metrics | All authenticated users |
| **Internal** | User ID, subscription status, behavioral events | Analysts and engineers |
| **Confidential** | Email, full name, phone number | Restricted to specific roles |
| **Restricted** | Payment card data, government ID, health data | Named individuals only, logged |
### Tagging in dbt
```yaml
# models/staging/schema.yml
models:
- name: stg_users
columns:
- name: user_id
meta:
pii_tier: internal
- name: email
meta:
pii_tier: confidential
pii_type: email_address
regulation: [GDPR, PDPA]
- name: full_name
meta:
pii_tier: confidential
pii_type: name
```
### Tagging in BigQuery (policy tags)
```sql
-- Assign a policy tag to restrict column-level access
ALTER TABLE `project.dataset.u