The Industry's First Parser & Generator
One library, every transformation. Parse complex documents into a structured AST and generate high-fidelity Markdown, HTML, or RAG-ready chunks optimized for LLM integration and modern AI workflows.
Universal Support
One library to rule them all. Built to handle the complexities of legacy and modern office standards.
DOCX
Modern Word
XLSX
Modern Excel
PPTX
Modern Powerpoint
Documents
ODT
OpenDocument Text
ODS
OpenDocument Spreadsheet
ODP
OpenDocument Presentation
RTF
Rich Text
CSV
Spreadsheets
MD
Markdown
HTML
Web Pages
Powerful Features
Designed for developers who need more than just plain text extraction.
One-Stop API
Powered by OfficeGenerator. Use OfficeConverter.convert for seamless,
one-step transformations with Intelligent Auto-Sync.
RAG Chunking
Native document splitting for AI. Supports fixed-size, structural, and semantic strategies with metadata awareness.
Structured AST
The foundation for generation. Use .to(format) directly on the AST for high-fidelity
conversion.
Universal Parser
Now supports **CSV, HTML, and Markdown** as input formats, enabling true any-to-any document pipelines.
High Fidelity
Unmatched support for merged table cells, anchors, bookmarks, and complex document hierarchies.
Extreme Performance
Up to 23x speedup for ODP parsing; highly optimized engines for RTF, PDF, and large Excel workbooks.
Vision & OCR
Seamlessly read text from images within documents using our integrated, pooled Tesseract.js worker engine.
Deep Metadata
Access full document properties, author info, and custom XML/ODF properties across all major formats.
Native RAG & Chunking
Don't just split textβunderstand it. Our structural chunking preserves the semantic context that naive splitters destroy.
Heading Awareness
Chunks automatically inherit the context of their parent headings, ensuring the LLM knows the exact topic of every fragment.
Physical Metadata
Every chunk includes its physical origin: page number, slide name, or sheet index from the original source document.
Table Integrity
Intelligent row-level splitting for large tables. Automatically repeats headers in every chunk to maintain structural clarity.
Specifications
Detailed technical documentation for the core components of officeParser.
OfficeConverter
High-level API for simple, one-step document transformations with zero-config defaults.
Parser Config
Full reference of parsing options including OCR, attachment handling, and delimiters.
Generator Config
Complete guide to common and format-specific settings for HTML, Markdown, PDF, and more.
Generator API & RAG
Core API usage for granular control and native strategies for AI-ready chunking.
AST Reference
Understand the structure of the Abstract Syntax Tree and all supported node types.
Debugging
Tips for handling edge cases, large files, and process lifecycle management.
AST Visualizer
Drop a file to see the power of our structured parsing engine in action.
Drag & Drop your file here or Click to Browse
DOCX, PPTX, XLSX, PDF, RTF, CSV, MD, HTML, ODT, ODP, ODS