The Industry's First Parser & Generator

One library, every transformation. Parse complex documents into a structured AST and generate high-fidelity Markdown, HTML, or RAG-ready chunks optimized for LLM integration and modern AI workflows.

Try Visualizer View on NPM

11+

Formats Supported

Universal Support

One library to rule them all. Built to handle the complexities of legacy and modern office standards.

DOCX

Modern Word

XLSX

Modern Excel

PPTX

Modern Powerpoint

PDF

Documents

ODT

OpenDocument Text

ODS

OpenDocument Spreadsheet

ODP

OpenDocument Presentation

RTF

Rich Text

CSV

Spreadsheets

MD

Markdown

HTML

Web Pages

Powerful Features

Designed for developers who need more than just plain text extraction.

🔄

One-Stop API

Powered by OfficeGenerator. Use OfficeConverter.convert for seamless, one-step transformations with Intelligent Auto-Sync.

🧩

RAG Chunking

Native document splitting for AI. Supports fixed-size, structural, and semantic strategies with metadata awareness.

🌳

Structured AST

The foundation for generation. Use .to(format) directly on the AST for high-fidelity conversion.

🌍

Universal Parser

Now supports **CSV, HTML, and Markdown** as input formats, enabling true any-to-any document pipelines.

📎

High Fidelity

Unmatched support for merged table cells, anchors, bookmarks, and complex document hierarchies.

⚡

Extreme Performance

Up to 23x speedup for ODP parsing; highly optimized engines for RTF, PDF, and large Excel workbooks.

🔍

Vision & OCR

Seamlessly read text from images within documents using our integrated, pooled Tesseract.js worker engine.

⚙️

Deep Metadata

Access full document properties, author info, and custom XML/ODF properties across all major formats.

Optimized for AI

Native RAG & Chunking

Don't just split text—understand it. Our structural chunking preserves the semantic context that naive splitters destroy.

📑

Heading Awareness

Chunks automatically inherit the context of their parent headings, ensuring the LLM knows the exact topic of every fragment.

📍

Physical Metadata

Every chunk includes its physical origin: page number, slide name, or sheet index from the original source document.

📊

Table Integrity

Intelligent row-level splitting for large tables. Automatically repeats headers in every chunk to maintain structural clarity.

Specifications

Detailed technical documentation for the core components of officeParser.

OfficeConverter

High-level API for simple, one-step document transformations with zero-config defaults.

Read Converter Spec →

Parser Config

Full reference of parsing options including OCR, attachment handling, and delimiters.

Read Parser Spec →

Generator Config

Complete guide to common and format-specific settings for HTML, Markdown, PDF, and more.

Read Config Spec →

Generator API & RAG

Core API usage for granular control and native strategies for AI-ready chunking.

Read Generator Spec →

AST Reference

Understand the structure of the Abstract Syntax Tree and all supported node types.

Read AST Spec →

Debugging

Tips for handling edge cases, large files, and process lifecycle management.

Read Debugging Spec →

AST Visualizer

Drop a file to see the power of our structured parsing engine in action.

High Fidelity Guarantee This preview is built 100% from the AST output. This demonstrates the extreme depth and structural accuracy of our parsing engine—allowing us to reconstruct complex document layouts with near-perfect fidelity solely from the structured data.

📤

Drag & Drop your file here or Click to Browse

DOCX, PPTX, XLSX, PDF, RTF, CSV, MD, HTML, ODT, ODP, ODS

Parser Configuration

Common Generator Configuration

Custom Style Map

HTML PREVIEW

MARKDOWN

CHUNKS (RAG READY)

CSV / SPREADSHEET

RTF (RICH TEXT)

Placeholder