Engineering
vm-infrastructure-ops avatar

vm-infrastructure-ops

Troubleshoot and manage the GCP e2-micro VM running the eth-realtime-collector. Handle systemd failures, network connectivity issues, and real-time data stream monitoring for Ethereum network data.

Introduction

This skill provides a comprehensive operational toolkit for managing the eth-realtime-collector deployed on GCP e2-micro instances. It is specifically designed for engineers responsible for maintaining Ethereum network data pipelines, ensuring that critical infrastructure remains highly available. The skill streamlines the triage process for common production issues, such as service crashes, gRPC metadata validation errors, DNS resolution failures, and downstream ClickHouse data gaps. By providing standardized workflows for status checking, log analysis, and recovery, it minimizes downtime during infrastructure instability.

  • Real-time monitoring of eth-collector service status and systemd lifecycle management.

  • Advanced log streaming using journalctl for rapid debugging of connection refused, gRPC errors, or metadata server unreachable issues.

  • Automated service recovery through managed restart scripts that include pre-deployment checks.

  • Emergency infrastructure interventions, including hard VM resets when network connectivity is entirely lost.

  • Data flow verification pipelines to confirm that blocks are successfully reaching the ClickHouse database.

  • Target Audience: DevOps engineers, SREs, and data engineers managing blockchain ingestion pipelines.

  • Use Case: Resolve 'no blocks received' alerts by identifying service failure patterns or underlying GCP networking issues.

  • Prerequisites: Valid GCP project access (eonlabs-ethereum-bq), configured gcloud CLI, and access to the zone us-east1-b.

  • Operational Best Practice: Always check service health via the status workflow before attempting a full VM reset. Use the provided log tails to distinguish between transient service errors and persistent infrastructure failures.

  • Constraints: Targeted specifically at the eth-realtime-collector deployment; ensure proper environment credentials are loaded before executing verification scripts.

Repository Stats

Stars
0
Forks
0
Open Issues
0
Language
Python
Default Branch
main
Sync Status
Idle
Last Synced
May 3, 2026, 06:32 PM
View on GitHub