Engineering
massive-context-mcp avatar

massive-context-mcp

Process massive files and large codebases (10M+ tokens) by recursively chunking, sub-querying, and aggregating results to overcome LLM context limits.

Introduction

The massive-context-mcp server provides an implementation of the Recursive Language Model (RLM) pattern, designed to handle inputs that exceed standard LLM context windows, such as large log files, massive datasets, or entire code repositories. By treating context as an external variable, the agent avoids stuffing the prompt with massive amounts of data, instead utilizing programmatic chunking and targeted sub-querying to extract information efficiently. This approach is ideal for developers, data scientists, and researchers who need to perform analysis on files exceeding 100KB or multi-file projects that normally trigger context errors.

  • Perform deep codebase analysis by traversing across dozens of files using parallelized sub-queries.

  • Process massive log files (10MB+) by filtering specific patterns (e.g., regex error matching) and aggregating insights across chunks.

  • Support for dual-mode inference: utilize the high-accuracy Claude-SDK (Haiku) for production tasks or cost-free, privacy-focused local inference via Ollama.

  • Automated orchestration with the rlm_auto_analyze tool, which detects content type and optimizes chunking strategies (lines, characters, or paragraphs) for the best LLM response.

  • Fine-grained control over the processing lifecycle: load, inspect, chunk, sub-query, store, and aggregate results using dedicated MCP tools.

  • Integration hooks for Claude Desktop and Claude Code, providing proactive suggestions to use RLM when files over 10KB are accessed, preventing context bloat automatically.

  • Use the RLM pattern only for tasks that truly exceed context limits; small files are better handled by standard tools.

  • Chunking strategies are context-dependent: use 'lines' for structured code or logs, 'paragraphs' for documentation or prose, and 'chars' for unstructured datasets.

  • When using sub-query batches, keep concurrency settings below 8 to ensure stable API performance.

  • Ollama models (e.g., gemma3:12b) are excellent for budget-conscious or local-first privacy requirements, but ensure at least 16GB of system RAM for optimal local performance.

  • The RLM system stores intermediate results in a configurable directory (default ~/.rlm-data), allowing for iterative refinement of complex analysis tasks without re-processing chunks.

Repository Stats

Stars
0
Forks
0
Open Issues
1
Language
Python
Default Branch
main
Sync Status
Idle
Last Synced
May 4, 2026, 03:12 AM
View on GitHub