cost-optimization-datalisted
Install: claude install-skill Methasit-Pun/data_engineer_claude_skills
# Cost Optimization for Cloud Data Infrastructure
## Find the Money First
Before optimizing anything, identify the actual cost drivers. Cloud consoles lie by omission — the default billing view shows totals, not which queries or tables are responsible.
### BigQuery: find expensive queries
```sql
-- Top 20 most expensive queries in the last 7 days
SELECT
user_email,
total_bytes_processed / POW(1024, 4) AS tb_processed,
total_bytes_processed / POW(1024, 4) * 6.25 AS estimated_cost_usd, -- on-demand rate
query,
creation_time
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE
creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
AND job_type = 'QUERY'
AND state = 'DONE'
ORDER BY total_bytes_processed DESC
LIMIT 20;
```
```sql
-- Top tables by bytes scanned (which tables are being read most expensively)
SELECT
referenced_table.table_id AS table_name,
COUNT(*) AS query_count,
SUM(total_bytes_processed) / POW(1024, 4) AS total_tb_scanned
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT,
UNNEST(referenced_tables) AS referenced_table
WHERE creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
GROUP BY 1
ORDER BY total_tb_scanned DESC
LIMIT 20;
```
### Snowflake: find credit-burning warehouses
```sql
-- Warehouse credit consumption by day
SELECT
warehouse_name,
DATE(start_time) AS usage_date,
SUM(credits_used) AS credits_used,
SUM(credits_used) * 3.0 AS estimated_cost_usd --