← ClaudeAtlas

document-analysislisted

Use this skill when extracting, sanitizing, segmenting, or analyzing a document end-to-end through the RegulAItor pipeline (PDF or Markdown). Activates the full extract→sanitize→segment→loop[gate→retriever→analyst→auditor]→aggregate flow with SSDLC-aligned defaults.
enriquerodrig/regulaitor · ★ 0 · Data & Documents · score 65
Install: claude install-skill enriquerodrig/regulaitor
# Document Analysis (H5) ## When to use - Analyzing a corporate document (policy, contract, impact assessment) against an EU regulatory corpus (AI Act, GDPR, NIS2, DORA). - Extending or debugging the document pipeline modules (`document/extractor.py`, `document/sanitizer.py`, `document/segmenter.py`, `orchestration/document_graph.py`). - Adding new anti-injection patterns for document mode. ## When NOT to use - Chat queries → use `orchestration.graph.run` (H4) instead. - Corpus ingestion (regulatory text) → that is `corpus/fetch.py` + `corpus/parse.py` (H1), not this pipeline. - One-off PDF inspection → use the MCP tool `extract_document` directly; do not wrap it in custom orchestration. ## Canonical procedure The single supported entrypoint is: ```python from regulaitor.orchestration.document_graph import run_document report = run_document( file_bytes=open("policy.pdf", "rb").read(), mime_type="application/pdf", language="es", corpus=["ai_act", "gdpr"], ) ``` CLI equivalent: ```bash python -m scripts.analyze --file policy.pdf --lang es --corpus ai_act,gdpr ``` ## What the pipeline guarantees 1. **No bypass of the sanitizer.** MCP tools `extract_document` and `segment_document` are inspection helpers; the only way to run the full E2E flow is `run_document(...)` (in-process). 2. **No citation, no answer.** Every Finding returned has at least one literal citation validated against the corpus. 3. **Deterministic verdict aggregation.** Per-Finding leni