semantic-compression
Aggressively prune grammatical scaffolding and filler text from inputs to optimize LLM token usage while retaining core semantic content.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
144 skills found
Aggressively prune grammatical scaffolding and filler text from inputs to optimize LLM token usage while retaining core semantic content.
Optimize agent performance and token usage through advanced context compression, structured summarization, and task-oriented state management for long-running sessions.
A structured repository of Agent Skills for context engineering, multi-agent architectures, and production-grade agent system optimization.
Optimize agent context windows through KV-caching, observation masking, summarization-based compaction, and context partitioning to reduce costs and latency.
Generate or edit images using AI models like FLUX and Gemini. Ideal for photos, illustrations, concept art, and visual assets, excluding technical diagrams and schematics.
A powerful CLI tool for image compression and conversion, supporting batch processing, multiple engines (mozjpeg, pngquant, sharp, etc.), format conversion (WebP, AVIF), and recursive directory optimization.
Enhance workflow efficiency by performing manual context compaction at logical task boundaries instead of relying on unpredictable auto-compaction.
Analyzes markdown files to identify token-wasting patterns, providing actionable suggestions to optimize documentation for LLM consumption and token efficiency.
Anthropic Claude integration patterns: streaming, RAG with pgvector, tool use, model selection (Haiku/Sonnet/Opus), prompt caching, and cost management for AI-powered engineering.
Analyze markdown documentation files to ensure compliance with predefined AI token budgets and optimize content for efficient AI ingestion.
Migrate standard PostgreSQL tables to TimescaleDB hypertables with optimized partitioning, chunking, and compression strategies for time-series data.
Analyze local system hardware (RAM, CPU, GPU/VRAM) to receive expert recommendations for optimized local LLM models, quantization settings, and performance estimates.