ab-test-analysis
Analyze A/B test results with statistical rigor. Includes significance testing, sample size validation, guardrail monitoring, and actionable recommendations (ship/extend/stop) using Python scripts.
Introduction
This skill acts as a data-driven partner for product managers and analysts tasked with evaluating experiment outcomes. It provides a structured workflow for interpreting A/B test data, moving beyond surface-level metrics to determine if a variant's performance is statistically significant, practically meaningful, and safe for production. The skill is designed for scenarios where you need to move from raw data to a clear 'ship', 'extend', or 'stop' decision based on rigorous statistical standards.
-
Perform statistical significance testing using two-tailed z-tests or chi-squared tests to calculate p-values and confidence intervals.
-
Validate sample size requirements using power analysis (1-p) formulas to ensure experiments are not underpowered.
-
Monitor guardrail metrics like revenue, engagement, or page load times to ensure positive primary metric lifts don't come at the cost of overall system health.
-
Calculate relative lift and assess whether observed trends are transient or persistent.
-
Automatically generate and execute Python scripts to handle raw CSV, Excel, or analytics export files for precise numerical analysis.
-
Detect Sample Ratio Mismatch (SRM) and account for potential novelty or primacy effects that could skew interpretation.
-
Inputs: Experiment hypothesis, primary and guardrail metrics, traffic split ratios, duration data, and optional raw data files (CSV/Excel).
-
Outputs: A structured analysis report featuring a summary table of metrics, statistical validation results, and an evidence-based recommendation for the experiment's next steps.
-
Target Audience: Product Managers, Data Analysts, and Growth Leads performing rapid iteration and validation.
-
Constraints: Requires clear definition of metrics and hypothesis prior to analysis to ensure validity; requires appropriate data access to perform automated Python processing; results are bounded by the input data quality and experimental design integrity.
-
Integration: This skill is part of the pm-data-analytics plugin and is frequently used alongside north-star metrics definition and cohort analysis workflows to ensure holistic product strategy.
Repository Stats
- Stars
- 10,763
- Forks
- 1,244
- Open Issues
- 13
- Language
- Not provided
- Default Branch
- main
- Sync Status
- Idle
- Last Synced
- Apr 29, 2026, 02:23 PM