diarioslisted

Reference for the shared diarios Python module — API lookup, usage patterns, and gotchas. Use when writing code that involves court data, legal text parsing, Brazilian administrative data cleaning, or when checking if a utility already exists before writing a new one.
hsigstad/research-kit · ★ 0 · Data & Documents · score 75

Install: claude install-skill hsigstad/research-kit

# diarios Module Reference The `diarios` package is a shared Python module used across projects for Brazilian court and administrative data processing. **Always check here before writing new utility functions.** ## Quick API reference The workspace root contains `CLAUDE.md` alongside `projects/`, `pipelines/`, `diarios/`, `research/`. If the current directory is inside a project or pipeline, search upward to find the root. Read `$ROOT/research/meta/diarios_api.md` for the full module inventory. Key areas: ### Text & Data Cleaning (`diarios.clean.text`) - `clean_text()` — clean text (character removal, case, accents) - `map_regex()` / `remove_regexes()` / `extract_series()` — regex utilities - `add_leads_and_lags()` — panel data lag/lead construction - `read_csv()` — CSV reader with sensible defaults ### Number & ID Cleaning (`diarios.clean.numbers`) - `clean_cnj_number()` / `is_cnj_number()` — CNJ case number normalization - `clean_reais()` / `parse_brl()` — Brazilian currency parsing - `clean_cpf()` — CPF tax ID cleaning - `clean_oab()` — OAB lawyer number cleaning - `get_tribunal()` / `get_filing_year()` — extract metadata from case numbers ### Legal Domain (`diarios.clean.legal`) - `clean_parte()` / `clean_parte_key()` — party name cleaning - `clean_classe()` — case class normalization - `get_procedencia()` / `get_plaintiffwins()` — outcome extraction - `load_datajud_jsonl()` / `normalize_datajud()` — DataJud data loading ### Geography (`diarios.clean.geo`) - `TRT`