pipeline

Introduction

The pipeline skill is a comprehensive toolkit for managing ELT processes within the SignalRoom marketing data platform. It acts as the primary interface for triggering, monitoring, and debugging data ingestion tasks that move information from marketing sources such as Everflow, Redtrack, and S3-hosted CSV files into a Supabase-backed PostgreSQL database. By leveraging dlt for data extraction and Temporal for durable, fault-tolerant workflow orchestration, this skill ensures that marketing data—including affiliate conversions, revenue, and ad spend—is processed reliably even during network or system disruptions. It is intended for data engineers and system operators who need to maintain pipeline integrity, manage scheduling, and troubleshoot ingestion failures in production environments.

Executes ad-hoc dlt pipelines locally for testing and debugging using scripts/run_pipeline.py.
Triggers production Temporal workflows with optional blocking waits and notification flags via scripts/trigger_workflow.py.
Manages and audits automated synchronization schedules (e.g., hourly Everflow syncs or daily S3 batch processing).
Provides deep visibility into system status through integrated logging, Temporal Cloud UI links, and direct access to worker logs.
Configures and updates source registries within the SignalRoom architecture.
Always activate the virtual environment and ensure the .env file is configured with necessary credentials before running commands.
Use the --dry-run flag during testing to preview pipeline behavior without modifying the production database.
For production troubleshooting, verify Supabase connection settings (port 6543) and check the Temporal Cloud UI for workflow activity timeouts.
New sources must be defined within the src/signalroom/sources/ directory and registered in src/signalroom/pipelines/runner.py to be discoverable by the skill.
Input requirements include valid API keys for integrated platforms and properly formatted environment variables for S3, Everflow, and Redtrack connections.
Output typically includes pipeline load IDs, row counts, and status updates for managed workflows.

Startup Courses

Online Courses

Physical Courses

Introduction

Repository Stats