pyhealth
Comprehensive Python healthcare AI toolkit for clinical data processing, medical coding translation, and developing deep learning models like RETAIN and Transformers for EHR, physiological signals, and clinical prediction tasks.
Introduction
PyHealth is a comprehensive Python library engineered for healthcare artificial intelligence and clinical machine learning research. It provides a specialized, modular pipeline designed to streamline the lifecycle of clinical data from raw electronic health records (EHR) to deployment-ready predictive models. This skill is intended for clinical researchers, data scientists, and healthcare AI engineers working with complex medical datasets and diagnostic architectures. The library is optimized for performance, offering processing speeds significantly faster than standard pandas workflows, making it ideal for large-scale clinical cohorts.
-
Full support for major healthcare datasets including MIMIC-III, MIMIC-IV, eICU, and OMOP CDM.
-
Robust medical coding translation engine covering ICD-9/10, NDC, RxNorm, ATC, and CCS systems for data standardization.
-
Predefined clinical prediction task library covering mortality prediction, readmission risk, drug recommendation, and length of stay.
-
Extensive library of 33+ specialized models including RETAIN, SafeDrug, GAMENet, StageNet, AdaCare, and state-of-the-art Transformer/GNN architectures for EHR.
-
Advanced preprocessing capabilities for sequential events, physiological signals (EEG, ECG), medical imaging, and clinical text.
-
Integrated training and evaluation modules with support for fairness metrics, calibration, interpretability, and uncertainty quantification.
-
Invoke this skill when performing predictive modeling on EHR data, cross-walking between disparate medical coding systems, or building clinical decision support tools.
-
Ensure datasets are appropriately formatted before ingestion, as the skill provides specific adapters for standard EHR structures (Patients, Visits, Events).
-
For physiological signal tasks, utilize the dedicated preprocessing utilities to handle sampling rates and temporal alignment before passing data to deep learning models.
-
Use the model selection module to compare baseline statistical methods like Logistic Regression with complex healthcare-specific neural networks.
-
The toolkit handles high-dimensional sparse clinical data efficiently; ensure your compute environment has adequate GPU memory when training Transformer-based architectures on longitudinal patient data.
Repository Stats
- Stars
- 19,784
- Forks
- 2,208
- Open Issues
- 41
- Language
- Python
- Default Branch
- main
- Sync Status
- Idle
- Last Synced
- Apr 30, 2026, 11:18 AM