Engineering
sherlock-review avatar

sherlock-review

Evidence-based code review using Sherlock Holmes-style deductive reasoning to verify implementation claims, investigate bugs, and conduct root cause analysis.

Introduction

Sherlock Review is a specialized quality engineering skill designed to eliminate ambiguity in software development by treating PRs, bug reports, and performance claims as investigative cases. Rather than relying on superficial code reading, it enforces a systematic, evidence-based approach: Observe, Deduce, Eliminate, and Conclude. This agentic tool is intended for senior engineers, QA leads, and developers who need to verify if a proposed fix actually addresses the root cause or merely masks a symptom. It is particularly effective for high-stakes scenarios such as verifying performance improvements, validating complex security claims, or performing post-mortem root cause analysis on flaky tests.

  • Employs Holmesian logic to rule out the impossible and trust only reproducible data.

  • Automates evidence collection by cross-referencing commit histories, git diffs, test coverage reports, and runtime behavior.

  • Generates structured investigation reports, providing a clear verdict (TRUE, PARTIALLY TRUE, FALSE, or NONSENSICAL) with weighted findings.

  • Includes a specialized templates for investigating bug fixes, performance optimizations, and edge case handling.

  • Supports multi-agent fleet coordination, allowing the agent to call upon specialized security or performance auditors for deeper verification.

  • To use, provide the agent with specific claims from a PR or commit message (e.g., 'Fixes memory leak', 'Improves performance 30%').

  • Inputs include the target commit range or PR number, and the agent outputs a rigorous markdown report detailing the claim versus the reality.

  • Requires local environment access to run tests, benchmarks, or git log operations for data collection.

  • Recommended for critical review cycles where technical accuracy is paramount and standard code review processes are insufficient.

  • Note that this skill prioritizes empirical evidence; if an investigation finds no supporting data, the agent will flag the claim as unsupported, encouraging more thorough validation before merging code.

Repository Stats

Stars
329
Forks
65
Open Issues
4
Language
TypeScript
Default Branch
main
Sync Status
Idle
Last Synced
Apr 29, 2026, 07:00 AM
View on GitHub