data_processor
A multi-paradigm ETL pipeline agent supporting batch and streaming data processing, schema inference, and configurable DAG-based transformations for heterogeneous data sources.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
121 skills found
A multi-paradigm ETL pipeline agent supporting batch and streaming data processing, schema inference, and configurable DAG-based transformations for heterogeneous data sources.
Preprocessing and cleaning astronomical light curves using Lightkurve. Tools for outlier removal, flattening, trend detrending, and quality flag handling for time-series analysis.
Classical machine learning with scikit-learn. Use for classification, regression, clustering, dimensionality reduction, preprocessing, model evaluation, and building robust ML pipelines in Python.
Data Analysis Specialist for EDA, statistical modeling, SQL queries, and Python-based visualization. Turn raw datasets into actionable insights through rigorous quantitative methods.
Load, validate, and preprocess weekly insurance policy CSV data with intelligent period detection and standardization.
Generates data cleaning pipelines for pandas/polars/PySpark, handling missing values, duplicates, outliers, type conversions, and validation.
A modular data processing tool for cleaning, validating, and analyzing CSV files with support for custom transformations and automated dependency management.
A versatile data analysis assistant for loading datasets, performing statistical calculations, visualizing trends, and generating professional summary reports.
Python toolkit for mass spectrometry data processing. Enables spectral file importing (mzML, MGF, MSP), metadata harmonization, peak filtering, and calculating spectral similarity scores (cosine, modified cosine) for metabolomics.
Build and orchestrate end-to-end MLOps pipelines covering data preparation, training, validation, and automated deployment.
Comprehensive Python healthcare AI toolkit for clinical data processing, medical coding translation, and developing deep learning models like RETAIN and Transformers for EHR, physiological signals, and clinical prediction tasks.
Guidelines for curating high-quality datasets for LLM post-training (SFT/DPO/RLHF), covering data formats, quality filtering, and collection strategies.