Data Analysis
paperbanana avatar

paperbanana

Generate publication-quality statistical plots from CSV or JSON data files using AI-driven automated visualization.

Introduction

PaperBanana is an advanced agentic tool designed for AI researchers and data scientists to automate the creation of publication-quality statistical visualizations. It streamlines the workflow from raw tabular data to finished academic figures, ensuring that plots meet the aesthetic and formatting standards required for formal research papers. By leveraging sophisticated large language models and vision-language capabilities, the agent interprets user intent and data structure to produce precise, meaningful, and professional-grade charts.

  • Intelligent Plot Generation: Translates natural language intent (e.g., 'Bar chart comparing model accuracy across benchmarks') into accurate statistical visual representations.

  • Data-Aware Processing: Handles common research data formats, specifically parsing CSV files into structured JSON or accepting raw JSON datasets directly.

  • Iterative Refinement: Employs a multi-phase generation pipeline that uses iterative feedback to improve visual quality, readability, and scientific accuracy.

  • Multi-Provider Support: Integrates with various VLM and image generation backends, including OpenAI (GPT-5.2), Azure OpenAI, and Google Gemini, ensuring robust performance across different infrastructure environments.

  • Batch Processing: Supports large-scale figure production through manifest-driven batch workflows, allowing users to generate numerous plots from extensive experimental datasets in a single execution.

  • MCP Compatibility: Built as a Model Context Protocol server, enabling seamless integration into IDEs like Claude Code or Cursor for direct CLI and agentic interaction.

  • Input formats: CSV and JSON files containing metrics, experimental results, or benchmarks.

  • Output: Professional PNG images ready for use in LaTeX or Markdown academic manuscripts.

  • Usage: Ideal for automating the visualization of methodology comparisons, performance benchmarks, and complex experimental data in AI research.

  • Constraints: Requires an active API key for OpenAI or Google Gemini. Users should ensure data is cleaned and properly keyed for the best generation outcomes.

Repository Stats

Stars
1,386
Forks
213
Open Issues
37
Language
Python
Default Branch
main
Sync Status
Idle
Last Synced
May 3, 2026, 11:43 PM
View on GitHub