Engineering
clarity-gate avatar

clarity-gate

Enforce epistemic quality in RAG systems with pre-ingestion verification. Ensures documents are properly qualified and structured before knowledge base entry.

Introduction

Clarity Gate is an open-source epistemic verification system designed to prevent LLMs from misinterpreting facts or assumptions within RAG pipelines. By implementing a strict pre-ingestion gate, it ensures that documents are properly qualified, validated against a Source of Truth (SOT), and marked for uncertainty where necessary. It distinguishes between detection—identifying existing hedges—and enforcement, which mandates uncertainty markers for ungrounded projections or claims.

The system utilizes a structured format (CGD - Clarity-Gated Documents) and provides deterministic Python tools for claim identification and document hashing, ensuring consistent verification across platforms like Claude Code, Cursor, and various CLI environments. It is intended for researchers, developers, and organizations building high-integrity RAG applications who need to verify the epistemic basis of their knowledge corpus.

  • Enforces epistemic quality standards through automated pre-ingestion checks.

  • Supports deterministic claim tracking using hash-based unique IDs (SHA-256).

  • Includes formal specifications (FORMAT_SPEC v2.1) for document structure and verification.

  • Provides automated validation codes to identify schema errors and structural anomalies in HITL (Human-In-The-Loop) records.

  • Validates SOT files to ensure factual claims match verified evidence before data ingestion.

  • Suitable for technical documentation, meeting notes, project specifications, and hypothesis-heavy datasets.

  • Requires Human-In-The-Loop (HITL) for final verification of factual accuracy.

  • Operates as a gatekeeper: documents failing epistemic checks are blocked from the RAG knowledge base until corrected.

  • Integrates with standard developer workflows via .claude/skills/, .codex/, and .github/ directories.

  • Does not perform factual truth verification; it strictly enforces the epistemic form of claims and methodology documentation.

  • Features robust canonicalization for hashing, ensuring content consistency across different OS environments and whitespace variations.

Repository Stats

Stars
27
Forks
3
Open Issues
0
Language
Python
Default Branch
main
Sync Status
Idle
Last Synced
May 4, 2026, 12:57 AM
View on GitHub