Data Analysis
ab-test-analysis avatar

ab-test-analysis

Analyze A/B test results with statistical rigor. Includes significance testing, sample size validation, guardrail monitoring, and actionable recommendations (ship/extend/stop) using Python scripts.

Introduction

This skill acts as a data-driven partner for product managers and analysts tasked with evaluating experiment outcomes. It provides a structured workflow for interpreting A/B test data, moving beyond surface-level metrics to determine if a variant's performance is statistically significant, practically meaningful, and safe for production. The skill is designed for scenarios where you need to move from raw data to a clear 'ship', 'extend', or 'stop' decision based on rigorous statistical standards.

  • Perform statistical significance testing using two-tailed z-tests or chi-squared tests to calculate p-values and confidence intervals.

  • Validate sample size requirements using power analysis (1-p) formulas to ensure experiments are not underpowered.

  • Monitor guardrail metrics like revenue, engagement, or page load times to ensure positive primary metric lifts don't come at the cost of overall system health.

  • Calculate relative lift and assess whether observed trends are transient or persistent.

  • Automatically generate and execute Python scripts to handle raw CSV, Excel, or analytics export files for precise numerical analysis.

  • Detect Sample Ratio Mismatch (SRM) and account for potential novelty or primacy effects that could skew interpretation.

  • Inputs: Experiment hypothesis, primary and guardrail metrics, traffic split ratios, duration data, and optional raw data files (CSV/Excel).

  • Outputs: A structured analysis report featuring a summary table of metrics, statistical validation results, and an evidence-based recommendation for the experiment's next steps.

  • Target Audience: Product Managers, Data Analysts, and Growth Leads performing rapid iteration and validation.

  • Constraints: Requires clear definition of metrics and hypothesis prior to analysis to ensure validity; requires appropriate data access to perform automated Python processing; results are bounded by the input data quality and experimental design integrity.

  • Integration: This skill is part of the pm-data-analytics plugin and is frequently used alongside north-star metrics definition and cohort analysis workflows to ensure holistic product strategy.

Repository Stats

Stars
10,763
Forks
1,244
Open Issues
13
Language
Not provided
Default Branch
main
Sync Status
Idle
Last Synced
Apr 29, 2026, 02:23 PM
View on GitHub