diarioslisted
Install: claude install-skill hsigstad/research-kit
# diarios Module Reference
The `diarios` package is a shared Python module used across projects for Brazilian court and administrative data processing. **Always check here before writing new utility functions.**
## Quick API reference
The workspace root contains `CLAUDE.md` alongside `projects/`, `pipelines/`, `diarios/`, `research/`. If the current directory is inside a project or pipeline, search upward to find the root.
Read `$ROOT/research/meta/diarios_api.md` for the full module inventory. Key areas:
### Text & Data Cleaning (`diarios.clean.text`)
- `clean_text()` — clean text (character removal, case, accents)
- `map_regex()` / `remove_regexes()` / `extract_series()` — regex utilities
- `add_leads_and_lags()` — panel data lag/lead construction
- `read_csv()` — CSV reader with sensible defaults
### Number & ID Cleaning (`diarios.clean.numbers`)
- `clean_cnj_number()` / `is_cnj_number()` — CNJ case number normalization
- `clean_reais()` / `parse_brl()` — Brazilian currency parsing
- `clean_cpf()` — CPF tax ID cleaning
- `clean_oab()` — OAB lawyer number cleaning
- `get_tribunal()` / `get_filing_year()` — extract metadata from case numbers
### Legal Domain (`diarios.clean.legal`)
- `clean_parte()` / `clean_parte_key()` — party name cleaning
- `clean_classe()` — case class normalization
- `get_procedencia()` / `get_plaintiffwins()` — outcome extraction
- `load_datajud_jsonl()` / `normalize_datajud()` — DataJud data loading
### Geography (`diarios.clean.geo`)
- `TRT`