Engineering
sherlock-review avatar

sherlock-review

Evidence-based investigative code review using deductive reasoning to verify implementation claims, investigate bugs, and conduct root cause analysis through systematic observation.

Introduction

Sherlock-review is a specialized quality engineering skill designed for forensic code analysis and rigorous validation of developer claims. It adopts a Holmesian approach to software maintenance, requiring agents to gather empirical evidence before reaching conclusions about a bug fix, feature implementation, or performance improvement. The skill moves beyond surface-level reading, mandating that code be executed and verified against independent tests to ensure that the code actually does what the PR description claims it does. This tool is intended for senior engineers, QA specialists, and investigators who need to resolve ambiguity in technical reports and ensure that fixes address root causes rather than symptoms. It is particularly effective for high-stakes debugging sessions where 'it works on my machine' is not a sufficient explanation.

  • Performs automated evidence collection using git diffs, log analysis, and local test execution.

  • Implements a structured 3-step investigation workflow: Observe (gather data), Deduce (compare claim vs. reality), and Conclude (verdict with proof).

  • Utilizes a rigorous classification system for findings: TRUE, PARTIALLY TRUE, FALSE, or NONSENSICAL.

  • Enforces a minimum findings threshold (weight-based) to ensure every investigation produces actionable, high-quality intelligence.

  • Provides a standardized Investigation Template to ensure consistency in reports and documentation.

  • Features integration with fleet coordination tools, allowing the skill to trigger concurrent domain-specific audits (security, performance, testing) based on the claim type.

  • Always prioritize reproducibility; trust only code that passes verified test assertions.

  • Be prepared to identify red flags such as silent error swallowing, performance testing on toy datasets, or workarounds instead of genuine architectural fixes.

  • The skill requires access to the repository's git history, test suites, and environment-specific runtime configurations.

  • Use the provided investigation template for all reports to maintain the expected output format for other automated pipeline agents.

  • Combine with brutal-honesty-review if the findings suggest systemic negligence or repeated architectural failures.

Repository Stats

Stars
329
Forks
65
Open Issues
4
Language
TypeScript
Default Branch
main
Sync Status
Idle
Last Synced
Apr 28, 2026, 12:37 PM
View on GitHub