The Wayback Machine - https://web.archive.org/web/20200525163346/https://github.com/topics/text-extraction
Skip to content
#

text-extraction

Here are 70 public repositories matching this topic...

gunnsth
gunnsth commented Apr 19, 2019

Currently the colorspace handling only supports DeviceGray and DeviceRGB and the handling is simplistic only looping through the images in XObject and compressing all of those. If any image was never used in the contentstream it would still not be removed for example.
Also this means that inline images are not handled.

The handling should be made more generic and use the ContentStreamProc

Improve this page

Add a description, image, and links to the text-extraction topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the text-extraction topic, visit your repo's landing page and select "manage topics."

Learn more

You can’t perform that action at this time.