spark-optimization
Optimize Apache Spark jobs with partitioning strategies, memory management, shuffle tuning, and data skew mitigation for high-performance data processing pipelines.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
145 skills found
Optimize Apache Spark jobs with partitioning strategies, memory management, shuffle tuning, and data skew mitigation for high-performance data processing pipelines.
Migrate standard PostgreSQL tables to TimescaleDB hypertables with optimized partitioning, chunking, and compression strategies for time-series data.
Expert SQL agent for modern database systems, query optimization, HTAP environments, and data architecture patterns. Optimize performance, schema design, and analytical workloads effectively.
Python skill for high-performance storage of chunked N-dimensional arrays using Zarr, supporting cloud storage (S3/GCS), parallel I/O, and integration with NumPy, Dask, and Xarray.
Build read models and projections from event streams for CQRS, materialized views, and optimized query performance in event-sourced systems.
Specialized data engineering agent for designing ETL/ELT pipelines, defining data schemas, managing data quality, and implementing robust ingestion workflows.
High-performance in-memory DataFrame library for Python and Rust. Features lazy evaluation, parallel execution, and an Apache Arrow backend for efficient ETL, data processing, and faster pandas alternatives.
Read and analyze any data file (CSV, JSON, Parquet, Avro, Excel, etc.) or remote URL (S3, HTTPS) using DuckDB. Automatically detect file formats and preview/profile datasets.
A multi-paradigm ETL pipeline agent supporting batch and streaming data processing, schema inference, and configurable DAG-based transformations for heterogeneous data sources.
Manage dlt data pipelines and Temporal workflows for the SignalRoom marketing platform. Sync sources like Everflow, Redtrack, and S3 to Postgres, check status, and debug ingestion.
Automate GitHub issue triage by analyzing reports against the codebase, verifying technical claims, and providing expert-driven responses to resolve invalid issues.
Upstash Vector DB setup, semantic search, namespaces, and embedding models. Ideal for building high-performance vector search features in Next.js 16/Vercel projects.