Research
translate avatar

translate

Translate research papers (markdown) while preserving LaTeX formulas, code blocks, and images, with support for batch processing, retries, and portable bundles.

Introduction

The translate skill is a specialized tool for academic researchers and AI agents tasked with processing literature. It enables the translation of structured research documents into multiple languages (zh, en, ja, ko, de, fr, es) while maintaining the integrity of complex scientific formatting. By using paragraph-aware chunking and heuristic language detection, it ensures that mathematical formulas, code snippets, and image references remain correctly mapped throughout the translation process. This skill is ideal for users working with large paper libraries who need to bridge language gaps without losing the technical context necessary for research, writing, or analysis.

  • Concurrent processing of document segments to maximize translation speed while monitoring progress via terminal output.
  • Robust error handling including multi-attempt retries with exponential backoff for individual segments, preventing full-job failure on network jitter.
  • Intelligent state management that tracks progress in local working directories, allowing for seamless continuation if a task is interrupted.
  • Support for portable output bundles that isolate translated files and their corresponding assets (images) into specialized directories for easy sharing.
  • Configurable batch processing and automated integration within standard research pipelines for automated literature enrichment.

Usage notes and practical constraints:

  • Target language and concurrency settings can be adjusted in the config.yaml file or overridden via CLI flags.
  • The --force flag completely resets the translation task for a specific paper, deleting temporary work files.
  • The --portable flag generates self-contained translation bundles at workspace/_system/translation-bundles/, ensuring images remain accessible outside the original source structure.
  • Translation is designed for Markdown-based paper representations. If working with PDF, ensure the pipeline has completed the initial conversion (PDF to Markdown) steps.
  • As a resource-intensive task, users can tune chunk_size and concurrency to balance performance against API rate limits.

Repository Stats

Stars
396
Forks
57
Open Issues
10
Language
Python
Default Branch
main
Sync Status
Idle
Last Synced
May 1, 2026, 07:07 AM
View on GitHub