github-wayback-recovery
Recover deleted GitHub repository content, issues, PRs, and files using Wayback Machine CDX API queries and web archive snapshots.
Introduction
The github-wayback-recovery skill is a specialized forensic utility designed to reconstruct information from GitHub repositories that have been deleted or made inaccessible. By leveraging the Internet Archive’s Wayback Machine and its powerful Capture Index (CDX) API, this skill allows users to systematically locate and extract archived versions of project artifacts. It is an essential tool for security researchers, open-source investigators, and developers needing to retrieve lost README files, historical issue discussions, pull request metadata, wiki documentation, and repository configurations that were captured prior to deletion. The tool simplifies the complexity of interacting with archive data by providing structured query patterns for various GitHub URL structures, including blobs, trees, and specific collaboration artifacts.
-
Performs automated archival availability checks to determine if a target repository has been indexed.
-
Utilizes the CDX API to execute precise searches using prefix matches, status code filtering, and timestamp-based queries to reduce noise.
-
Maps GitHub-specific URL patterns (commits, PRs, issues, releases, wiki) to corresponding archive-friendly query strings.
-
Facilitates forensic reconstruction of project metadata, including stars, languages, and license information.
-
Integrates seamlessly with other forensic skills, such as github-commit-recovery and github-archive, for multi-layered historical analysis.
-
Handles the recovery of non-code artifacts like issue bodies, PR comments, and release notes that are often preserved even when the primary repository is deleted.
-
Requires an internet connection to reach the Wayback Machine APIs (archive.org).
-
Success is strictly contingent upon whether the content was crawled and captured by web archiving services prior to deletion.
-
Note that this skill cannot recover private repositories, content protected by authentication, or full Git commit history; it recovers rendered HTML/web snapshots only.
-
Users should combine this with local git analysis tools when metadata indicates a commit SHA is available.
-
Input parameters typically require the owner and repository name; output provides URLs to specific time-stamped snapshots.
-
Always check the github-archive skill first for structured event data before attempting web archive scraping.
Repository Stats
- Stars
- 2,385
- Forks
- 367
- Open Issues
- 17
- Language
- Python
- Default Branch
- main
- Sync Status
- Idle
- Last Synced
- Apr 29, 2026, 07:52 AM