data-cleaning-pipeline-generator
Generates data cleaning pipelines for pandas/polars/PySpark, handling missing values, duplicates, outliers, type conversions, and validation.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
129 skills found
Generates data cleaning pipelines for pandas/polars/PySpark, handling missing values, duplicates, outliers, type conversions, and validation.
Load, validate, and preprocess weekly insurance policy CSV data with intelligent period detection and standardization.
A versatile data analysis assistant for loading datasets, performing statistical calculations, visualizing trends, and generating professional summary reports.
A modular data processing tool for cleaning, validating, and analyzing CSV files with support for custom transformations and automated dependency management.
Guidelines for curating high-quality datasets for LLM post-training (SFT/DPO/RLHF), covering data formats, quality filtering, and collection strategies.
Streamline your codebase by automatically removing redundant or obvious comments while preserving essential architectural and logic-focused documentation.
Research technical documentation and automatically generate ready-to-use software agent skills in markdown format.
Automated single-cell RNA-seq quality control pipeline following scverse best practices. Performs MAD-based outlier detection, cell filtering, and diagnostic visualization for .h5ad and .h5 datasets.
Implement robust server-side and client-side input validation using sanitization and allowlists to prevent injection attacks and ensure data integrity.
Data Analysis Specialist for EDA, statistical modeling, SQL queries, and Python-based visualization. Turn raw datasets into actionable insights through rigorous quantitative methods.
Preprocessing and cleaning astronomical light curves using Lightkurve. Tools for outlier removal, flattening, trend detrending, and quality flag handling for time-series analysis.
Audit, prune, and maintain vector memory for Clawdbot. Prevents token waste, clears junk data, and automates memory hygiene via LanceDB maintenance.