spark-optimization
Optimize Apache Spark jobs with partitioning strategies, memory management, shuffle tuning, and data skew mitigation for high-performance data processing pipelines.
Discover reusable agent skills, browse implementation details, and find the right skill for your workflow.
164 skills found
Optimize Apache Spark jobs with partitioning strategies, memory management, shuffle tuning, and data skew mitigation for high-performance data processing pipelines.
High-performance in-memory DataFrame library for Python and Rust. Features lazy evaluation, parallel execution, and an Apache Arrow backend for efficient ETL, data processing, and faster pandas alternatives.
Open-source infrastructure for reliable, multi-destination event delivery. Route webhooks to HTTP, SQS, RabbitMQ, Pub/Sub, EventBridge, or Kafka with built-in retries and observability.
World-class senior data engineering skill for building scalable data pipelines, ETL/ELT systems, and modern data infrastructure using Python, Spark, dbt, and Kafka.
A multi-paradigm ETL pipeline agent supporting batch and streaming data processing, schema inference, and configurable DAG-based transformations for heterogeneous data sources.
Build read models and projections from event streams for CQRS, materialized views, and optimized query performance in event-sourced systems.
Generates data cleaning pipelines for pandas/polars/PySpark, handling missing values, duplicates, outliers, type conversions, and validation.
Specialized data engineering agent for designing ETL/ELT pipelines, defining data schemas, managing data quality, and implementing robust ingestion workflows.
Orchestrate multi-agent swarms using agentic-flow for parallel task execution, dynamic topology, and intelligent coordination. Ideal for building distributed AI systems and scaling complex development workflows.
Train and manage neural networks in distributed E2B sandboxes using the Flow Nexus platform, supporting custom architectures like Transformers, LSTMs, and GANs.
High-performance document intelligence library for extracting text, tables, code, and metadata from 91+ file formats, with OCR and LLM-ready output.
Generate optimized SQL queries from natural language. Supports BigQuery, PostgreSQL, MySQL, and Snowflake. Analyze database schemas, interpret business requirements, and output ready-to-run queries with explanations.