Data Analysis
single-cell-rna-qc avatar

single-cell-rna-qc

Automated single-cell RNA-seq quality control pipeline following scverse best practices. Performs MAD-based outlier detection, cell filtering, and diagnostic visualization for .h5ad and .h5 datasets.

Introduction

This skill provides a standardized, automated workflow for performing quality control (QC) on single-cell RNA-seq data, specifically designed for researchers utilizing the scverse ecosystem, including scanpy and AnnData-based Python workflows. It is intended for bioinformatics data scientists and researchers needing to process 10X Genomics (.h5) or pre-processed (.h5ad) data efficiently. By centralizing best-practice methodology, the skill helps ensure data integrity before downstream analysis steps like normalization, clustering, or trajectory inference.

  • Executes an end-to-end QC pipeline that computes metrics including total counts, number of detected genes, and the percentage of mitochondrial, ribosomal, and hemoglobin gene expression.

  • Implements Median Absolute Deviation (MAD)-based filtering to dynamically identify and remove low-quality cells or outliers without relying on rigid, arbitrary hard thresholds.

  • Generates comprehensive visual diagnostic reports, including pre-filtering distribution histograms, threshold overlays, and post-filtering metric summaries to assess the impact of data cleaning.

  • Supports modular operations, allowing users to choose between a complete automated pipeline for standard workflows or custom function calls for specific, non-standard experimental designs.

  • Utilizes standard bioinformatics libraries including scanpy, anndata, numpy, scipy, matplotlib, and seaborn for high-performance data manipulation and publication-quality graphics.

  • The recommended workflow for most users is the complete automated pipeline using the provided qc_analysis.py script, which handles file loading and automatic threshold generation.

  • Input requirements consist of standard single-cell count matrices; ensure that feature patterns (e.g., mitochondrial gene nomenclature) are configured to match your organism-specific data if deviating from standard human/mouse patterns.

  • Outputs are organized by dataset, creating clear documentation of the filtering process that aids in reproducibility and methodology reporting.

  • This skill is built to be modular; if your workflow requires conditional filtering based on cell types or subset-specific parameters, use the qc_core and qc_plotting modules directly to maintain granular control over the analysis pipeline.

Repository Stats

Stars
24
Forks
3
Open Issues
0
Language
Python
Default Branch
main
Sync Status
Idle
Last Synced
May 3, 2026, 08:58 PM
View on GitHub