Engineering
context-optimization avatar

context-optimization

Optimize agent performance and reduce costs through context window management, including KV-cache optimization, observation masking, context compaction, and partitioning.

Introduction

This skill provides a systematic framework for context engineering, focusing on extending the effective capacity of limited LLM context windows. It is designed for developers and AI engineers building production-grade agentic systems who face challenges with context limits, rising token costs, and latency in long-running conversational or analytical workflows. By treating the context window as a finite resource, this skill enforces discipline in data curation to maintain model performance and coherence.

  • KV-cache optimization: Strategies for ordering prompts and stabilizing prefixes to maximize inference engine cache hits, significantly reducing latency and compute costs.

  • Observation masking: Techniques to replace verbose tool outputs with compact references once their utility is served, preserving critical information while reclaiming significant token budget.

  • Context compaction: Intelligent summarization of conversation history and tool outputs when utilization thresholds are met, prioritizing high-signal data while anchoring model behavior with stable system prompts.

  • Context partitioning: Methodologies for decomposing complex tasks into smaller, isolated sub-agent contexts to avoid window saturation and improve reasoning quality.

  • Activate this skill when encountering context-related constraints, such as token limits, degradation in reasoning accuracy for long trajectories, or the need for cost reduction at scale.

  • Expected inputs include raw prompt data, conversation trajectories, and tool-use records; outputs are optimized, refined versions of these inputs that maintain task-critical state.

  • Practical constraints: Compaction is a lossy operation; always measure effectiveness before and after applying strategies. Prioritize KV-cache stabilization as it is zero-risk, whereas aggressive compaction should be applied carefully to avoid loss of context-sensitive information. The skill is platform-agnostic, applicable to environments like Claude Code or standard LLM SDKs.

Repository Stats

Stars
15,322
Forks
1,202
Open Issues
25
Language
Python
Default Branch
main
Sync Status
Idle
Last Synced
Apr 28, 2026, 11:00 AM
View on GitHub