Engineering
cicd-diagnostics
Diagnose dotCMS CI/CD GitHub Actions failures, including PR builds, merge queue issues, and nightly test reports.
Introduction
The cicd-diagnostics skill is a specialized engineering toolkit designed for senior platform engineers to troubleshoot and resolve CI/CD pipeline failures within the dotCMS/core repository. It provides a structured, evidence-based workflow to move from raw failure logs to clear root causes, preventing the common anti-pattern of rushing into source code analysis before confirming if a failure is a known issue, a flaky test, or an infrastructure problem.
- Automatically identifies failed jobs and steps within GitHub Actions workflows (PRs, merge queues, trunk, and nightly runs).
- Performs intelligent log analysis to extract stack traces, assertion failures, and infrastructure errors while filtering out noise from 'continue-on-error' steps.
- Leverages built-in diagnostic scripts to perform preflight checks, workspace management, and structured evidence gathering using the
diagnose.pyutility. - Searches existing GitHub issues for historical context on test failures, ensuring that known issues are identified before starting deep-dive investigations.
- Provides advanced log analytics via
evidence.pyto classify failures as new defects, flaky tests, or environment-related timeouts. - Facilitates comparisons between different workflow types, such as discrepancies between PR validation and full merge queue test suites.
Usage notes and constraints:
- Requires execution within a valid checkout of dotCMS/core.
- Depends on Python 3.8+ and an authenticated GitHub CLI (gh) environment.
- Always prioritize triage and historical issue searches before deep code analysis to optimize token usage and engineering time.
- Use the provided subcommands (e.g., --metadata, --logs, --evidence) to progressively gather data rather than fetching full logs in every step.
- Avoid manual ad-hoc parsing of JSON logs; use the integrated utility functions to ensure consistent and accurate data extraction.
- The tool is optimized to detect specific signals such as flaky tests, infrastructure connectivity issues, and deployment/authentication errors.
Repository Stats
- Stars
- 943
- Forks
- 480
- Open Issues
- 820
- Language
- Java
- Default Branch
- main
- Sync Status
- Idle
- Last Synced
- Apr 29, 2026, 02:57 PM