Engineering
rag-engineer avatar

rag-engineer

Architect and optimize production-grade RAG systems. Master embedding models, vector databases, chunking strategies, and retrieval pipelines for high-accuracy LLM applications.

Introduction

The RAG Engineer skill provides a robust architectural framework for designing and maintaining Retrieval-Augmented Generation (RAG) systems. It is intended for software engineers, data scientists, and AI architects who need to bridge the gap between raw unstructured data and accurate LLM generation. The skill emphasizes that retrieval quality directly determines generation quality, advocating for a rigorous, data-first approach to building search-augmented AI.

  • Expert selection and fine-tuning of embedding models suited for specific content domains including code, legal, or technical documentation.

  • Design and management of vector database architecture, focusing on scalability and high-performance similarity search.

  • Implementation of advanced chunking strategies, including semantic chunking, hierarchical retrieval, and context continuity via overlaps.

  • Development of hybrid search systems using BM25, TF-IDF, and vector similarity combined with Reciprocal Rank Fusion.

  • Integration of query expansion techniques such as Hypothetical Document Embedding (HyDE) and multi-query retrieval to improve system recall.

  • Advanced context management including contextual compression, metadata filtering, and re-ranking using cross-encoders to optimize precision.

  • Users should apply this skill when building AI agents that require external knowledge, domain-specific expertise, or up-to-date information that falls outside the LLM's base training.

  • Input requirements include raw document corpuses, query logs, and relevance benchmarks for evaluation.

  • Expected outputs are production-ready retrieval pipelines that maintain high recall/precision ratios while minimizing hallucinations.

  • Practical constraints include balancing context window size limits, managing embedding model blind spots, and addressing the technical debt of fixed-size chunking fragmentation.

  • It is critical to treat the retrieval pipeline as a distinct modular component; always evaluate retrieval metrics such as hit rate and MRR independently from the final generative output quality.

Repository Stats

Stars
35,783
Forks
5,870
Open Issues
0
Language
Python
Default Branch
main
Sync Status
Idle
Last Synced
Apr 30, 2026, 10:50 AM
View on GitHub