Engineering
cicd-diagnostics avatar

cicd-diagnostics

Diagnose dotCMS CI/CD GitHub Actions failures, including PR builds, merge queue issues, and nightly test reports.

Introduction

The cicd-diagnostics skill is a specialized engineering toolkit designed for senior platform engineers to troubleshoot and resolve CI/CD pipeline failures within the dotCMS/core repository. It provides a structured, evidence-based workflow to move from raw failure logs to clear root causes, preventing the common anti-pattern of rushing into source code analysis before confirming if a failure is a known issue, a flaky test, or an infrastructure problem.

  • Automatically identifies failed jobs and steps within GitHub Actions workflows (PRs, merge queues, trunk, and nightly runs).
  • Performs intelligent log analysis to extract stack traces, assertion failures, and infrastructure errors while filtering out noise from 'continue-on-error' steps.
  • Leverages built-in diagnostic scripts to perform preflight checks, workspace management, and structured evidence gathering using the diagnose.py utility.
  • Searches existing GitHub issues for historical context on test failures, ensuring that known issues are identified before starting deep-dive investigations.
  • Provides advanced log analytics via evidence.py to classify failures as new defects, flaky tests, or environment-related timeouts.
  • Facilitates comparisons between different workflow types, such as discrepancies between PR validation and full merge queue test suites.

Usage notes and constraints:

  • Requires execution within a valid checkout of dotCMS/core.
  • Depends on Python 3.8+ and an authenticated GitHub CLI (gh) environment.
  • Always prioritize triage and historical issue searches before deep code analysis to optimize token usage and engineering time.
  • Use the provided subcommands (e.g., --metadata, --logs, --evidence) to progressively gather data rather than fetching full logs in every step.
  • Avoid manual ad-hoc parsing of JSON logs; use the integrated utility functions to ensure consistent and accurate data extraction.
  • The tool is optimized to detect specific signals such as flaky tests, infrastructure connectivity issues, and deployment/authentication errors.

Repository Stats

Stars
943
Forks
480
Open Issues
820
Language
Java
Default Branch
main
Sync Status
Idle
Last Synced
Apr 29, 2026, 02:57 PM
View on GitHub