Research
research-review avatar

research-review

Get deep, critical, NeurIPS/ICML-style peer reviews of your research, paper drafts, and experimental setups using external LLMs via Codex MCP.

Introduction

The research-review skill is designed for ML researchers and developers who need high-reasoning, critical feedback to overcome local minima and blind spots in their experimental methodology or academic writing. By leveraging the Codex MCP interface, this agent orchestrates a cross-model collaborative environment where a primary agent manages your project context while an external high-reasoning model (such as GPT-5.4, o3, or Claude 3.5) acts as an adversarial, senior-level reviewer. This adversarial approach effectively probes for logical gaps, unjustified claims, and narrative weaknesses that are often overlooked during self-review.

  • Performs multi-round, iterative critique sessions with persistent thread history tracking via Codex MCP.

  • Generates actionable outputs, including experiment designs, claims-to-results matrices, mock NeurIPS/ICML reviews, and paper structure outlines.

  • Automatically saves comprehensive review documents, trace logs, and follow-up plans to your project root or memory folders.

  • Integrates with research workflows in Claude Code, Cursor, Trae, and other agent-first IDEs, supporting diverse model providers.

  • Employs xhigh reasoning configurations to ensure depth and precision in complex scientific argumentation.

  • Use this when you have a paper draft, a set of experimental results, or a research proposal that requires rigorous validation.

  • Input requirements include clear research context, project narrative files (STORY.md, README.md), and specific research questions; the agent will compile these into a briefing before the reviewer call.

  • Expected outputs are structured reviews, prioritized TODO lists for compute-efficient experiments, and refined narrative strategies.

  • Constraints: Ensure the Codex MCP server is configured correctly; the external reviewer requires a clear, comprehensive prompt in Round 1 to provide high-quality, actionable feedback. Focus on iterative refinement rather than one-shot queries for the best results.

Repository Stats

Stars
7,821
Forks
729
Open Issues
53
Language
Python
Default Branch
main
Sync Status
Idle
Last Synced
Apr 30, 2026, 12:53 PM
View on GitHub