Engineering
claude-copilot avatar

claude-copilot

Analyze markdown documentation files to ensure compliance with predefined AI token budgets and optimize content for efficient AI ingestion.

Introduction

The Token Budget Check skill is an automated maintenance tool designed for teams using Large Language Models (LLMs) to index, search, or process large documentation repositories. By enforcing strict token constraints, this skill ensures that markdown files—such as operational guides, agent profiles, and product architecture summaries—remain within optimized ingestion limits. It is specifically tailored for developers and technical writers who integrate documentation into AI-powered workflows, preventing performance degradation caused by context window saturation.

The skill operates by scanning file systems, inferring document types based on path patterns or frontmatter metadata, and calculating estimated token counts using a standardized conversion factor. When a file exceeds its designated budget, the skill provides actionable, granular recommendations for content compression. Users can leverage this to proactively manage knowledge bases, ensuring that documentation remains readable for both human developers and downstream AI agents while maintaining cost efficiency and accuracy during RAG (Retrieval-Augmented Generation) processes.

  • Automatically calculates token estimates based on word count (1.4 multiplier) for precise budget monitoring.

  • Supports categorized budget tiers for various document types like SKILL.md, architecture summaries, and operational guides.

  • Identifies optimization patterns including converting prose to tables, shortening examples, and removing redundant headers.

  • Integrates with standard CLI tools like find, wc, and bash to function seamlessly within local development environments and CI/CD pipelines.

  • Generates structured token budget reports highlighting compliant files, warning levels (80-100% usage), and over-budget violations.

  • Use this skill regularly within your project root to audit documentation health before AI indexing cycles.

  • When a document is flagged, prioritize transforming dense paragraphs into structured tables or lists to achieve 30-50% token reduction.

  • The tool identifies files via pattern matching; ensure that custom documentation paths are aligned with the skill's trigger configurations.

  • Input is typically a directory path or specific markdown file; output is a formatted markdown summary with targeted improvement suggestions.

  • Note that this is an estimation tool; exact token counts may vary slightly depending on the specific tokenizer implementation of the target LLM.

Repository Stats

Stars
13
Forks
3
Open Issues
3
Language
Python
Default Branch
main
Sync Status
Idle
Last Synced
May 3, 2026, 05:03 PM
View on GitHub