regex-vs-llm-structured-text

Solid

Decision framework for choosing between regex and LLM when parsing structured text — start with regex, add LLM only for low-confidence edge cases.

AI & Automation 196,640 stars 30253 forks Updated 2 days ago MIT

Install

View on GitHub

Quality Score: 96/100

Stars 20%

100

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# Regex vs LLM for Structured Text Parsing A practical decision framework for parsing structured text (quizzes, forms, invoices, documents). The key insight: regex handles 95-98% of cases cheaply and deterministically. Reserve expensive LLM calls for the remaining edge cases. ## When to Activate - Parsing structured text with repeating patterns (questions, forms, tables) - Deciding between regex and LLM for text extraction - Building hybrid pipelines that combine both approaches - Optimizing cost/accuracy tradeoffs in text processing ## Decision Framework ``` Is the text format consistent and repeating? ├── Yes (>90% follows a pattern) → Start with Regex │ ├── Regex handles 95%+ → Done, no LLM needed │ └── Regex handles <95% → Add LLM for edge cases only └── No (free-form, highly variable) → Use LLM directly ``` ## Architecture Pattern ``` Source Text │ ▼ [Regex Parser] ─── Extracts structure (95-98% accuracy) │ ▼ [Text Cleaner] ─── Removes noise (markers, page numbers, artifacts) │ ▼ [Confidence Scorer] ─── Flags low-confidence extractions │ ├── High confidence (≥0.95) → Direct output │ └── Low confidence (<0.95) → [LLM Validator] → Output ``` ## Implementation ### 1. Regex Parser (Handles the Majority) ```python import re from dataclasses import dataclass @dataclass(frozen=True) class ParsedItem: id: str text: str choices: tuple[str, ...] answer: str confidence: float = 1.0 def parse_structured_text...

Details

Author: affaan-m
Repository: affaan-m/everything-claude-code
Created: 4 months ago
Last Updated: 2 days ago
Language: JavaScript
License: MIT

Integrates with

Anthropic · AI

Related Skills

AI & Automation Featured

videodb

See, Understand, Act on video and audio. See- ingest from local files, URLs, RTSP/live feeds, or live record desktop; return realtime context and playable stream links. Understand- extract frames, build visual/semantic/temporal indexes, and search moments with timestamps and auto-clips. Act- transcode and normalize (codec, fps, resolution, aspect ratio), perform timeline edits (subtitles, text/image overlays, branding, audio overlays, dubbing, translation), generate media assets (image, audio, video), and create real time alerts for events from live streams or desktop capture.

196,640 Updated 2 days ago

affaan-m

AI & Automation Featured

ck

Persistent per-project memory for Claude Code. Auto-loads project context on session start, tracks sessions with git activity, and writes to native memory. Commands run deterministic Node.js scripts — behavior is consistent across model versions.

196,640 Updated 2 days ago

affaan-m

AI & Automation Featured

browser

Web browser automation with AI-optimized snapshots for claude-flow agents

55,973 Updated today

ruvnet