web-archiving

Solid

Web page archiving and retrieval from cached/deleted sources. Use when accessing unavailable pages, preserving web content, creating legal evidence archives, or building redundant archival workflows. Covers Wayback Machine, Archive.today, ArchiveBox, and evidence preservation tools.

Web & Frontend 233 stars 44 forks Updated today MIT

Install

View on GitHub

Quality Score: 89/100

Stars 20%
79
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Web archiving methodology Patterns for accessing inaccessible web pages and preserving web content for journalism, research, and legal purposes. ## Archive service hierarchy Try services in this order for maximum coverage: ``` ┌─────────────────────────────────────────────────────────────────┐ │ ARCHIVE RETRIEVAL CASCADE │ ├─────────────────────────────────────────────────────────────────┤ │ │ │ 1. Wayback Machine (archive.org) │ │ └─ 900B+ pages, historical depth, API access │ │ ↓ not found │ │ 2. Archive.today (archive.is/archive.ph) │ │ └─ On-demand snapshots, paywall bypass │ │ └─ Caveat (2026): FBI subpoenaed registrar in Oct 2025; │ │ Wikipedia deprecated as citation source in Feb 2026 — │ │ prefer Wayback / Perma.cc for legal or citation use │ │ ↓ not found │ │ 3. Memento Time Travel (aggregator) │ │ └─ Searches multiple archives simultaneously │ │ │ │ Retired (do not use): Google Cache (`cache:` operator) was │ │ shut down in Sept 2024; Bing Cache dropdown was removed in │ │ the same year. Both formerly fed th...

Details

Author
jamditis
Repository
jamditis/claude-skills-journalism
Created
5 months ago
Last Updated
today
Language
HTML
License
MIT

Similar Skills

Semantically similar based on skill content — not just same category

Web & Frontend Listed

web-archive-scraper

Search the Wayback Machine for archived versions of websites. Extract cached pages, customer lists, testimonials, and partner directories from sites that have changed or gone offline. Uses the free CDX API — no API key needed.

711 Updated 3 weeks ago
gooseworks-ai
Code & Development Solid

digital-archive

Digital archiving workflows with AI enrichment, entity extraction, and knowledge graph construction. Use when building content archives, implementing AI-powered categorization, extracting entities and relationships, or integrating multiple data sources. Covers patterns from the Jay Rosen Digital Archive project.

233 Updated today
jamditis
AI & Automation Solid

archive

Archive session learnings, debugging solutions, and deployment logs to .archive/yyyy-mm-dd/ as indexed markdown with searchable tags. Use when completing a significant task, resolving a tricky bug, deploying, or when the user says "archive this". Maintains .archive/MEMORY.md index for cross-session knowledge reuse.

903 Updated yesterday
ReScienceLab
Data & Documents Solid

content-access

Legal methods for accessing paywalled and geo-blocked content. Use when researching behind paywalls, accessing academic papers, bypassing geographic restrictions, or finding open access alternatives. Covers Unpaywall, library databases, VPNs, and ethical access strategies for journalists and researchers.

233 Updated today
jamditis
Web & Frontend Listed

web

Search the public web with citations, fetch pages, and extract readable text.

0 Updated 1 months ago
ProfSynapse