Engineering
librarian avatar

librarian

Fetch, index, and search developer documentation from GitHub and websites to provide AI agents with accurate, grounded, and version-specific code context.

Introduction

Librarian is a specialized tool designed to solve the 'hallucination' problem in AI agent workflows by providing grounded, real-time developer documentation. It empowers developers and AI agents to ingest external knowledge bases—ranging from specific GitHub repositories to public documentation websites—into a local, vector-indexed database. By utilizing hybrid search (word matching combined with semantic vector search), Librarian ensures that API details, complex code examples, and version-specific guides are retrieved with high precision. This makes it an essential utility for developers working with rapidly evolving frameworks like Next.js, Hono, or custom internal APIs where standard model training data might be outdated or insufficient.

  • Ingests documentation from GitHub repositories using specified branches or versions and public websites via sitemap, llms.txt, or direct URL crawling.

  • Supports hybrid search modes including keyword-based, semantic vector, and combined retrieval to optimize for different types of queries.

  • Maintains library-specific versioning, allowing users to scope searches to specific release labels or documentation snapshots.

  • Provides a robust CLI interface for managing sources, triggering ingestions, and testing searches locally before integrating them into agentic workflows.

  • Includes an MCP (Model Context Protocol) server to seamlessly integrate documentation lookups into AI coding assistants like Claude Code or local developer environments.

  • Auto-detects runtime environments, handling complex installation processes for Bun, npm, and pnpm automatically.

  • Prerequisites: Requires Bun at runtime to handle heavy CLI operations and indexing tasks.

  • Installation: Preferred global installation via npm install -g @iannuttall/librarian or bun add -g.

  • Workflow: Start by adding sources (add), ingest data (ingest --embed), and use the search or get commands to retrieve specific chunks of documentation.

  • Versioning: Use librarian detect within a project directory to automatically identify version labels, then scope search commands with the --version flag.

  • Performance: For large sites, use specific --depth and --pages limiters to manage crawl scope and ingestion speed.

  • Privacy: All operations are performed locally on your machine, ensuring no sensitive documentation or API keys are exposed during the retrieval process.

Repository Stats

Stars
98
Forks
8
Open Issues
1
Language
TypeScript
Default Branch
main
Sync Status
Idle
Last Synced
May 3, 2026, 05:38 PM
View on GitHub