single-cell-rna-qc

Introduction

This skill provides a standardized, automated workflow for performing quality control (QC) on single-cell RNA-seq data, specifically designed for researchers utilizing the scverse ecosystem, including scanpy and AnnData-based Python workflows. It is intended for bioinformatics data scientists and researchers needing to process 10X Genomics (.h5) or pre-processed (.h5ad) data efficiently. By centralizing best-practice methodology, the skill helps ensure data integrity before downstream analysis steps like normalization, clustering, or trajectory inference.

Executes an end-to-end QC pipeline that computes metrics including total counts, number of detected genes, and the percentage of mitochondrial, ribosomal, and hemoglobin gene expression.
Implements Median Absolute Deviation (MAD)-based filtering to dynamically identify and remove low-quality cells or outliers without relying on rigid, arbitrary hard thresholds.
Generates comprehensive visual diagnostic reports, including pre-filtering distribution histograms, threshold overlays, and post-filtering metric summaries to assess the impact of data cleaning.
Supports modular operations, allowing users to choose between a complete automated pipeline for standard workflows or custom function calls for specific, non-standard experimental designs.
Utilizes standard bioinformatics libraries including scanpy, anndata, numpy, scipy, matplotlib, and seaborn for high-performance data manipulation and publication-quality graphics.
The recommended workflow for most users is the complete automated pipeline using the provided qc_analysis.py script, which handles file loading and automatic threshold generation.
Input requirements consist of standard single-cell count matrices; ensure that feature patterns (e.g., mitochondrial gene nomenclature) are configured to match your organism-specific data if deviating from standard human/mouse patterns.
Outputs are organized by dataset, creating clear documentation of the filtering process that aids in reproducibility and methodology reporting.
This skill is built to be modular; if your workflow requires conditional filtering based on cell types or subset-specific parameters, use the qc_core and qc_plotting modules directly to maintain granular control over the analysis pipeline.

Startup Courses

Online Courses

Physical Courses

Introduction

Repository Stats