content-hash-cache-pattern

Solid

使用SHA-256内容哈希缓存昂贵的文件处理结果——路径无关、自动失效、服务层分离。

AI & Automation 196,640 stars 30253 forks Updated 2 days ago MIT

Install

View on GitHub

Quality Score: 95/100

Stars 20%

100

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# 内容哈希文件缓存模式使用 SHA-256 内容哈希作为缓存键，缓存昂贵的文件处理结果（PDF 解析、文本提取、图像分析）。与基于路径的缓存不同，此方法在文件移动/重命名后仍然有效，并在内容更改时自动失效。 ## 何时激活 * 构建文件处理管道时（PDF、图像、文本提取） * 处理成本高且同一文件被重复处理时 * 需要一个 `--cache/--no-cache` CLI 选项时 * 希望在不修改现有纯函数的情况下为其添加缓存时 ## 核心模式 ### 1. 基于内容哈希的缓存键使用文件内容（而非路径）作为缓存键： ```python import hashlib from pathlib import Path _HASH_CHUNK_SIZE = 65536 # 64KB chunks for large files def compute_file_hash(path: Path) -> str: """SHA-256 of file contents (chunked for large files).""" if not path.is_file(): raise FileNotFoundError(f"File not found: {path}") sha256 = hashlib.sha256() with open(path, "rb") as f: while True: chunk = f.read(_HASH_CHUNK_SIZE) if not chunk: break sha256.update(chunk) return sha256.hexdigest() ``` **为什么使用内容哈希？** 文件重命名/移动 = 缓存命中。内容更改 = 自动失效。无需索引文件。 ### 2. 用于缓存条目的冻结数据类 ```python from dataclasses import dataclass @dataclass(frozen=True, slots=True) class CacheEntry: file_hash: str source_path: str document: ExtractedDocument # The cached result ``` ### 3. 基于文件的缓存存储每个缓存条目都存储为 `{hash}.json` —— 通过哈希实现 O(1) 查找，无需索引文件。 ```python import json from typing import Any def write_cache(cache_dir: Path, entry: CacheEntry) -> None: cache_dir.mkdir(parents=True, exist_ok=True) cache_file = cache_dir / f"{entry.file_hash}.json" data = serialize_entry(entry) cache_file.write_text(json.dumps(data, ensure_ascii=False), encoding="utf-8") def read_cache(cache...

Details

Author: affaan-m
Repository: affaan-m/everything-claude-code
Created: 4 months ago
Last Updated: 2 days ago
Language: JavaScript
License: MIT

Integrates with

Anthropic · AI

Related Skills

AI & Automation Featured

videodb

See, Understand, Act on video and audio. See- ingest from local files, URLs, RTSP/live feeds, or live record desktop; return realtime context and playable stream links. Understand- extract frames, build visual/semantic/temporal indexes, and search moments with timestamps and auto-clips. Act- transcode and normalize (codec, fps, resolution, aspect ratio), perform timeline edits (subtitles, text/image overlays, branding, audio overlays, dubbing, translation), generate media assets (image, audio, video), and create real time alerts for events from live streams or desktop capture.

196,640 Updated 2 days ago

affaan-m

AI & Automation Featured

ck

Persistent per-project memory for Claude Code. Auto-loads project context on session start, tracks sessions with git activity, and writes to native memory. Commands run deterministic Node.js scripts — behavior is consistent across model versions.

196,640 Updated 2 days ago

affaan-m

AI & Automation Featured

browser

Web browser automation with AI-optimized snapshots for claude-flow agents

55,973 Updated today

ruvnet