Engineering
senior-data-engineer avatar

senior-data-engineer

World-class senior data engineering skill for building scalable data pipelines, ETL/ELT systems, and modern data infrastructure using Python, Spark, dbt, and Kafka.

Introduction

The Senior Data Engineer skill provides advanced expertise for designing, deploying, and maintaining production-grade data systems and AI/ML infrastructure. It is specifically designed for senior-level data engineers, architects, and MLOps professionals who need to manage complex, high-throughput environments while ensuring data quality, security, and scalability. This skill empowers users to automate data workflows and implement robust architectural patterns.

  • Advanced data pipeline orchestration using Airflow and custom Python scripts for reliable execution.

  • Comprehensive performance optimization techniques for ETL/ELT workflows to minimize latency and cloud infrastructure costs.

  • Expertise in distributed computing frameworks including Spark and Kafka for real-time processing and batch data ingestion.

  • Implementation of data governance, quality validation frameworks, and DataOps best practices to maintain pipeline integrity.

  • Support for modern data stack components including dbt for transformation, and databases like BigQuery, Snowflake, and PostgreSQL.

  • MLOps integration capabilities for model deployment, feature store management, and real-time inference monitoring using Prometheus and MLflow.

  • Use this skill when initiating new data architecture projects or refactoring legacy pipelines to meet modern performance targets (P50 < 50ms).

  • Provide input in the form of raw data configurations, SQL schema definitions, or performance bottlenecks, and receive structured pipeline scripts or optimization strategies as output.

  • Ensure all deployments adhere to security and compliance standards, including PII handling and encryption protocols.

  • Adhere to test-driven development (TDD) and CI/CD best practices when executing infrastructure changes to ensure high availability and minimal error rates.

  • Leverage the included reference documentation to align team practices with industry-standard patterns for system design and scalability.

Repository Stats

Stars
16
Forks
6
Open Issues
1
Language
Python
Default Branch
main
Sync Status
Idle
Last Synced
May 3, 2026, 05:55 AM
View on GitHub