paper-writing-benchlisted

Reverse-engineer raw materials (Sparse idea, Dense idea, experimental log) from an existing AI research paper to build a benchmark case for evaluating paper-writing pipelines. Replicates the PaperWritingBench dataset construction procedure from arXiv:2604.05018 §3 / App. C. TRIGGER when the user asks to "build a benchmark case from this paper", "reverse-engineer raw materials", or "evaluate my pipeline against PaperWritingBench".
Ar9av/PaperOrchestra · ★ 569 · AI & Automation · score 79

Install: claude install-skill Ar9av/PaperOrchestra

# PaperWritingBench (§3) Faithful implementation of the PaperWritingBench dataset construction procedure from PaperOrchestra (Song et al., 2026, arXiv:2604.05018, §3 and App. C, F.2). The original benchmark contains 200 papers (100 CVPR 2025 + 100 ICLR 2025). For each paper, the authors reverse-engineer the (I, E) tuple by stripping narrative flow from the original PDF using the three prompts in App. F.2. You can use this skill to reverse-engineer your own benchmark cases from any paper PDF. ## What this skill does Given an existing AI research paper (PDF or markdown extract), produce: - `idea.md` (Sparse variant) — high-level concept note, no math, no experimental results - `idea.md` (Dense variant) — detailed technical proposal with LaTeX equations and variable definitions, but still no experimental results - `experimental_log.md` — exhaustive raw experimental setup, numeric data, and qualitative observations, with all narrative references stripped These three files form a complete (I, E) input pair for the paper-orchestra pipeline. You can then run the pipeline and compare its output to the original paper using `paper-autoraters`. ## Inputs - A paper PDF or extracted markdown text. The paper uses MinerU (Wang et al., 2024) for PDF→markdown extraction; you (the host agent) should use whatever PDF extractor your environment provides. - For controlled experiments, you may also extract figures separately (PDFFigures 2.0 in the paper). ## Outputs - `bench/