regex-vs-llm-structured-textlisted

Decision framework for choosing between regex and LLM when parsing structured text — start with regex, add LLM only for low-confidence edge cases.
SilantevBitcoin/Base-system-Claude · ★ 1 · AI & Automation · score 74

Install: claude install-skill SilantevBitcoin/Base-system-Claude

# Regex vs LLM for Structured Text Parsing A practical decision framework for parsing structured text (quizzes, forms, invoices, documents). The key insight: regex handles 95-98% of cases cheaply and deterministically. Reserve expensive LLM calls for the remaining edge cases. ## When to Activate - Parsing structured text with repeating patterns (questions, forms, tables) - Deciding between regex and LLM for text extraction - Building hybrid pipelines that combine both approaches - Optimizing cost/accuracy tradeoffs in text processing ## Decision Framework ``` Is the text format consistent and repeating? ├── Yes (>90% follows a pattern) → Start with Regex │ ├── Regex handles 95%+ → Done, no LLM needed │ └── Regex handles <95% → Add LLM for edge cases only └── No (free-form, highly variable) → Use LLM directly ``` ## Architecture Pattern ``` Source Text │ ▼ [Regex Parser] ─── Extracts structure (95-98% accuracy) │ ▼ [Text Cleaner] ─── Removes noise (markers, page numbers, artifacts) │ ▼ [Confidence Scorer] ─── Flags low-confidence extractions │ ├── High confidence (≥0.95) → Direct output │ └── Low confidence (<0.95) → [LLM Validator] → Output ``` ## Implementation ### 1. Regex Parser (Handles the Majority) ```python import re from dataclasses import dataclass @dataclass(frozen=True) class ParsedItem: id: str text: str choices: tuple[str, ...] answer: str confidence: float = 1.0 def parse_structured_text