data-engineer
Specialized data engineering agent for designing ETL/ELT pipelines, defining data schemas, managing data quality, and implementing robust ingestion workflows.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
130 skills found
Specialized data engineering agent for designing ETL/ELT pipelines, defining data schemas, managing data quality, and implementing robust ingestion workflows.
Read and analyze any data file (CSV, JSON, Parquet, Avro, Excel, etc.) or remote URL (S3, HTTPS) using DuckDB. Automatically detect file formats and preview/profile datasets.
Expert SQL agent for modern database systems, query optimization, HTAP environments, and data architecture patterns. Optimize performance, schema design, and analytical workloads effectively.
Create, manage, and debug dlt (data load tool) pipelines for ingesting data from APIs, databases, and custom sources into destinations like DuckDB, BigQuery, and Snowflake.
Implement production-grade data quality validation using Great Expectations, dbt tests, and data contracts to ensure reliable pipelines.
A multi-paradigm ETL pipeline agent supporting batch and streaming data processing, schema inference, and configurable DAG-based transformations for heterogeneous data sources.
Optimize Apache Spark jobs with partitioning strategies, memory management, shuffle tuning, and data skew mitigation for high-performance data processing pipelines.
World-class senior data engineering skill for building scalable data pipelines, ETL/ELT systems, and modern data infrastructure using Python, Spark, dbt, and Kafka.
Create, alter, and validate Snowflake semantic views using the Snowflake CLI.
Migrate standard PostgreSQL tables to TimescaleDB hypertables with optimized partitioning, chunking, and compression strategies for time-series data.
High-performance in-memory DataFrame library for Python and Rust. Features lazy evaluation, parallel execution, and an Apache Arrow backend for efficient ETL, data processing, and faster pandas alternatives.
Build read models and projections from event streams for CQRS, materialized views, and optimized query performance in event-sourced systems.