latent-briefing

Solid

This skill should be used when the user asks to "share memory between agents", "KV cache compaction for multi-agent", "orchestrator worker context", "latent briefing", "reduce worker tokens", "cross-agent memory without summarization", or discusses Attention Matching compaction, recursive language models with workers, or token explosion in hierarchical agents.

AI & Automation 895 stars 164 forks Updated today MIT

Install

View on GitHub

Quality Score: 93/100

Stars 20%

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# Latent Briefing and KV Cache Memory Sharing Hierarchical multi-agent systems often pay for the same context twice. The orchestrator accumulates a long reasoning trajectory, but each worker usually receives only a narrow text handoff such as a subtask prompt plus raw document slices. Passing the full trajectory fixes coverage but drives token cost up on every worker call. Summarization introduces latency and information loss. Retrieval helps with document access but does not preserve the orchestrator's evolving reasoning state. Latent Briefing addresses this by sharing memory at the **representation level** rather than the text level. The core idea is to compact the orchestrator trajectory in the worker model's KV cache, keeping positions that are most relevant to the **current worker task**. The method builds on **Attention Matching (AM)** KV cache compaction and adapts it for inference-time multi-agent handoff with task-guided queries, a shared token mask across heads, and robust thresholding. ## When to Activate Activate this skill when: - Designing orchestrator-worker or supervisor-specialist systems where workers need access to prior orchestrator state without replaying the full trajectory as text - Evaluating alternatives to LLM summarization or RAG for cross-agent state transfer - Implementing or studying **KV cache compaction** as a first-class inference primitive, not only prefix caching of identical prompts - Debugging token explosion in recursive, hierarchica...

Details

Author: guanyang
Repository: guanyang/antigravity-skills
Created: 5 months ago
Last Updated: today
Language: TypeScript
License: MIT

Integrates with

Anthropic · AI

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Listed

latent-briefing

3 Updated 6 days ago

mouadja02

Code & Development Solid

latent-space-engineering

Shape agent behavior through instruction framing, emotional priming, and style transfer rather than information density alone.

309 Updated today

athola

AI & Automation Solid

context-compression

This skill should be used when long-running agent sessions need context compression, structured summarization, compaction, token-per-task optimization, or durable handoff summaries that preserve decisions, files, risks, and next actions.

895 Updated today

guanyang