Productivity
Word Document Handler avatar

Word Document Handler

Comprehensive Microsoft Word (.docx) handler for creation, editing, text extraction, tracked changes, and XML-level document analysis.

Introduction

The Word Document Handler is a specialized skill designed for professional-grade interaction with Microsoft Word (.docx) files. It leverages both high-level abstraction tools and low-level Office Open XML (OOXML) manipulation to ensure structural integrity, formatting preservation, and precise content modification. This agent is intended for users who need to automate document workflows, perform bulk text analysis, or manage complex documents involving tracked changes and embedded media.

  • Precise text extraction from .docx files into clean markdown format using pandoc, preserving structure and tracked changes.

  • Full creation capabilities via the docx-js library for generating professional documents with specific Paragraph, TextRun, and Document components.

  • Advanced editing workflows for existing files, including unpacking, raw XML manipulation, and automated re-packing of the document structure.

  • Robust validation suite using custom schema validators to ensure XML compliance with ISO-IEC 29500 standards after manual modification.

  • Built-in support for processing tracked changes (redlines) using specific author tags to maintain document audit trails.

  • Conversion pipelines using LibreOffice and Poppler tools to transform complex .docx documents into high-resolution PDF or JPEG assets for visual auditing and analysis.

  • Always read the provided docx-js.md and ooxml.md documentation files in full without range limits before performing document creation or editing tasks.

  • Ensure raw XML operations focus on key files such as word/document.xml, word/comments.xml, and media assets.

  • Always validate documents using the provided validation.py script immediately after XML modifications to prevent corruption.

  • Use the specific author tag w:author="Claude" for all tracked changes to maintain consistency across edits.

  • When converting documents to images, utilize the specified resolution (-r 150) or page ranges to balance document quality with output file size.

  • Dependencies must be pre-installed in the environment, including pandoc, docx, LibreOffice, and poppler-utils.

Repository Stats

Stars
11
Forks
2
Open Issues
0
Language
Python
Default Branch
main
Sync Status
Idle
Last Synced
May 3, 2026, 10:50 PM
View on GitHub