DEV Community

Cover image for DocWire SDK 2025.06.19 Released – Major OCR & PDF Layout Upgrades, Archive Refactor, CI Improvements
Krzysztof Nowicki
Krzysztof Nowicki

Posted on

DocWire SDK 2025.06.19 Released – Major OCR & PDF Layout Upgrades, Archive Refactor, CI Improvements

We’re back with a substantial update to DocWire SDK, the modern C++ library for structured document parsing, data extraction and secure, high-performance back-end workflows.

Version 2025.06.19 focuses on sharper OCR, more faithful PDF layout reconstruction and a brand-new archive module, alongside testing and CI upgrades.

Full release notes: https://github.com/docwire/docwire/releases/tag/2025.06.19

What’s New

1 · OCR Enhancements

  • Structured output with positional metadataOCRParser now returns x, y, width, height plus line, paragraph and section grouping.
  • Configurable confidence filter (0–100) to ignore low-confidence words.

2 · Higher-Fidelity PDF Parsing

  • Refactored PDFParser to sort elements by position, yielding more accurate text flow and layout reconstruction.

3 · Modern Archive Handling

  • New docwire_archives library for modular, maintainable and faster archive processing.
  • Archive detection is now MIME-based.

4 · Expanded Format Support

  • Automatic detection for ASP and ASP.NET documents.

Developer-Centric Improvements

  • Plain-text exporter handles page breaks more clearly.
  • CI pipeline moves to windows-2025 runners; ASAN re-enabled on Windows.
  • Broader automated test coverage (OCR, HTTP, CLI).
  • Build fix on Windows via NOMINMAX flag to resolve windows.h / PDFium conflicts.
  • Spacing and line-break corrections in PDF and OCR outputs.

Documentation

API docs and module dependency notes are fully up to date.


OCR’s vision, sharp and newly bright

PDF layouts, now a clearer sight

Archives rebuilt, with structure firm and new

DocWire advances, steady, strong, and true


Try It Now

We welcome feedback, issues and PRs.

Next up: deeper LLM integration and VCPKG support.

— The DocWire Team

Top comments (0)