single-cell-rna-qc
Automated single-cell RNA-seq quality control pipeline following scverse best practices. Performs MAD-based outlier detection, cell filtering, and diagnostic visualization for .h5ad and .h5 datasets.
Introduction
This skill provides a standardized, automated workflow for performing quality control (QC) on single-cell RNA-seq data, specifically designed for researchers utilizing the scverse ecosystem, including scanpy and AnnData-based Python workflows. It is intended for bioinformatics data scientists and researchers needing to process 10X Genomics (.h5) or pre-processed (.h5ad) data efficiently. By centralizing best-practice methodology, the skill helps ensure data integrity before downstream analysis steps like normalization, clustering, or trajectory inference.
-
Executes an end-to-end QC pipeline that computes metrics including total counts, number of detected genes, and the percentage of mitochondrial, ribosomal, and hemoglobin gene expression.
-
Implements Median Absolute Deviation (MAD)-based filtering to dynamically identify and remove low-quality cells or outliers without relying on rigid, arbitrary hard thresholds.
-
Generates comprehensive visual diagnostic reports, including pre-filtering distribution histograms, threshold overlays, and post-filtering metric summaries to assess the impact of data cleaning.
-
Supports modular operations, allowing users to choose between a complete automated pipeline for standard workflows or custom function calls for specific, non-standard experimental designs.
-
Utilizes standard bioinformatics libraries including scanpy, anndata, numpy, scipy, matplotlib, and seaborn for high-performance data manipulation and publication-quality graphics.
-
The recommended workflow for most users is the complete automated pipeline using the provided qc_analysis.py script, which handles file loading and automatic threshold generation.
-
Input requirements consist of standard single-cell count matrices; ensure that feature patterns (e.g., mitochondrial gene nomenclature) are configured to match your organism-specific data if deviating from standard human/mouse patterns.
-
Outputs are organized by dataset, creating clear documentation of the filtering process that aids in reproducibility and methodology reporting.
-
This skill is built to be modular; if your workflow requires conditional filtering based on cell types or subset-specific parameters, use the qc_core and qc_plotting modules directly to maintain granular control over the analysis pipeline.
Repository Stats
- Stars
- 24
- Forks
- 3
- Open Issues
- 0
- Language
- Python
- Default Branch
- main
- Sync Status
- Idle
- Last Synced
- May 3, 2026, 08:58 PM