flaky-detect
Identify, categorize, and troubleshoot flaky tests by analyzing CI history, execution patterns, and code structure to improve test suite reliability.
Introduction
The Flaky Detect skill is an engineering-focused utility designed to isolate non-deterministic test failures—commonly known as flaky tests—that cost engineering teams significant productivity. By leveraging historical CI data and static code analysis, this tool helps developers move beyond simple debugging to implement root-cause resolution for intermittent test suites. It is ideal for CI/CD engineers, QA leads, and software developers struggling with instability in large-scale test suites.
-
Analyzes GitHub Actions and custom CI logs to detect statistical pass/fail anomalies across test runs.
-
Categorizes flaky behavior based on established industry taxonomies, including Async/Timing issues, Test Order dependencies, Environmental differences, Resource limitations, and Non-deterministic logic.
-
Performs static analysis of test files to identify problematic patterns like explicit timeouts, unawaited promises, uncontrolled randomness, and external environment dependencies.
-
Provides actionable recommendations for test improvement, such as implementing mocked time (e.g., vi.setSystemTime) or ensuring test isolation.
-
Supports multi-run analysis for high-confidence identification of intermittent failures.
-
The skill is best triggered when users report that CI pipelines are failing unpredictably or that specific tests pass and fail without changes to the underlying logic.
-
Input requirements include access to historical CI logs or test execution results. Output is provided as a structured report detailing pass rates, root-cause categorization, specific line-level vulnerabilities, and proposed code fixes.
-
Practical constraints include the need for a sufficient volume of historical data (e.g., at least 5 runs) to calculate reliable pass rates. While it provides automated detection, manual verification of proposed fixes is recommended to ensure test intent is preserved.
-
Use this tool when auditing test reliability, preparing for releases, or performing maintenance on legacy testing infrastructure to ensure stable, repeatable deployments.
Repository Stats
- Stars
- 127
- Forks
- 18
- Open Issues
- 1
- Language
- TypeScript
- Default Branch
- main
- Sync Status
- Idle
- Last Synced
- Apr 30, 2026, 08:00 AM