split-pdf

Introduction

The split-pdf skill is a specialized research tool designed to overcome the limitations of large language models when dealing with long-form academic documents. By automatically fragmenting extensive PDFs into manageable four-page chunks, it enables an iterative, deep-reading process that systematically builds comprehensive structured notes. This tool is ideal for researchers, students, and analysts who need to process papers, book chapters, or long technical reports without encountering unrecoverable context window errors or shallow, hallucinated summaries.

Automatically acquires academic papers from local file paths or web search queries using WebSearch and WebFetch tools.
Implements a rigorous splitting protocol using PyPDF2 to create four-page chunks, stored in dedicated, organized build directories to prevent source material modification.
Enforces a pause-and-confirm interaction model where the agent reads exactly three splits (approx. 12 pages) per cycle to maintain focus and accuracy.
Performs structured information extraction, focusing on research questions, target audiences, methodology, and key contributions, synthesized into a persistent notes.md file.
Provides intelligent state management, checking for existing extractions or split files before re-processing to save time and token costs.
Always provide either a specific file path or a precise search query (title, author, year) to initialize the process.
The tool requires the preservation of the original PDF; all processing occurs on temporary derivative split files to ensure the integrity of your document library.
If an existing extract (basename_text.md) is found, the agent will prompt you to use it rather than re-reading the PDF from scratch.
The workflow is strictly sequential: retrieve, split, read in batches, update notes, and confirm before continuing to the next 12-page block.
Ensure the environment has access to PyPDF2 for the splitting operations; the agent will attempt to install it if missing.

Startup Courses

Online Courses

Physical Courses

Introduction

Repository Stats