Engineering
always-works-testing avatar

always-works-testing

Mandatory execution-based validation for all software implementation tasks. Ensures code works through empirical verification before confirmation.

Introduction

The Always Works™ Testing Philosophy serves as a rigorous standard for AI agents to bridge the gap between theoretical code logic and operational reality. Designed for developers and engineers who prioritize reliability, this skill prevents common AI pitfalls where agents assume code is correct simply because it is syntactically sound. By mandating a 30-second reality check for every modification, the agent enforces a disciplined workflow that prioritizes observable evidence over speculative reasoning. It is intended for use in environments where code, APIs, UI elements, or data structures are being actively modified, ensuring that no change is confirmed to the user without prior execution and validation.

  • Systematic Verification: Mandatory execution of scripts, tests, or manual trigger checks to confirm functionality for every change.

  • Reality Economics: Minimizes long-term costs associated with debugging failed implementations by catching errors during the development loop.

  • Behavioral Guardrails: Implements a set of forbidden phrases to discourage untested assumptions, such as 'This should work' or 'I have fixed the issue' without verification.

  • Context-Aware Testing: Provides specific protocols for testing UI changes (browser interaction), API calls (curl/Postman), database queries, and configuration updates.

  • Transparency Standards: Requires explicit communication of limitations if full testing environments are unavailable, ensuring the user is always aware of the scope of verification.

  • Use this skill whenever implementing new features, refactoring existing logic, or applying hotfixes to codebases.

  • Inputs involve the current implementation context; outputs include the results of validation tools like bash, terminal logs, or browser observations.

  • Strictly avoid claiming success until the agent has successfully observed the expected output in the runtime environment.

  • If a task exceeds the testing scope, the agent must document what was verified (e.g., syntax, logic) versus what remains untested (e.g., prod-only credentials).

Repository Stats

Stars
0
Forks
0
Open Issues
0
Language
Not provided
Default Branch
main
Sync Status
Idle
Last Synced
May 3, 2026, 10:05 PM
View on GitHub