qras
A local RAG semantic memory system using Qdrant and Ollama. Ideal for recalling workspace files, notes, project decisions, and user preferences with high-relevance vector search.
Introduction
QRAS is a high-performance, local-first Retrieval-Augmented Generation (RAG) system designed to serve as an intelligent semantic memory layer for agents and personal workspaces. By leveraging Qdrant as the vector database and Ollama for embedding generation, it enables users to index and query vast amounts of unstructured data, including markdown notes, daily logs, and project documentation, with sub-millisecond retrieval times. The system is engineered to bridge the gap between static file storage and active memory recall, ensuring that agents can access context-aware information about past decisions, relationships, and user preferences without relying on external cloud APIs or privacy-compromising SaaS tools.
-
Advanced Semantic Vector Search: Utilizes high-dimensional embeddings to interpret natural language intent, moving beyond simple keyword matching to grasp the context of user queries.
-
Hybrid Search Capabilities: Integrates vector similarity with keyword-based filtering to ensure both conceptual relevance and terminological precision.
-
Incremental Indexing: Supports real-time updates through selective file re-indexing, ensuring memory remains current without the need for redundant full-database refreshes.
-
LLM-Optimized Output: Provides token-efficient retrieval formats specifically designed for LLM agents, ensuring context windows are utilized effectively.
-
Developer-Centric CLI: Includes a robust command-line interface for complex indexing tasks, collection management, and interactive chat sessions.
-
Scalable Architecture: Built on Python with support for Dockerized services, enabling deployment on local machines, edge servers, or integrated workspace environments.
-
Primary Use Cases: Ideal for knowledge workers, developers, and AI agents requiring a reliable 'second brain.' Use it to query 'what was decided in the invoice meeting,' 'what is the user's preferred task format,' or 'find all references to the Qdrant integration.'
-
Input/Output: Accepts local markdown directories and JSON datasets as input. Outputs structured context snippets that can be fed directly into downstream LLM prompts.
-
Deployment Constraints: Requires local execution of Ollama and Qdrant containers. Users should ensure sufficient system resources (CPU/RAM) for embedding model operations (e.g., bge-m3:567m).
-
Best Practices: Always treat QRAS as the primary search tool for memory-related inquiries. Maintain a clean 'memory/' directory structure to prevent index pollution from non-relevant source files.
Repository Stats
- Stars
- 9
- Forks
- 0
- Open Issues
- 0
- Language
- Python
- Default Branch
- main
- Sync Status
- Idle
- Last Synced
- May 3, 2026, 10:12 PM