I assume that the size difference is related to ABBY using some commercial magic to be smart about image compression
Don't assume, investigate. The PDF format is well-documented, read up on the details. Open your PDF files in an editor (or just use less) and have a look at how the pages are actually encoded, and find the difference. Or install a package like mutools with command line tools that can extract parts of the PDF file.
An image in a PDF will take up a different amount of space based on the resolution it is stored with (which may or may not be the same as the one it was scanned with), and the compression algorithm.
Standard compression according to the standard are methods are
- ASCIIHexDecode
- ASCII85Decode
- LZWDecode (Lempel-Ziv-Welch) .
- FlateDecode (zlib/deflate)
- RunLengthDecode
and a few that probably don't apply.
So find out what resolution and compression method ABBY used, then try to find tools to reproduce that (and you may need to modify existing tools if they don't do this out of the box).
From what I understand, this means that ABBY uses the JPXDecode filter with the Mask feature to encode the image, which means that I'd be looking for a linux/FOSS alternative that can do JPXDecode (JPG2000?) compression.
Exactly. Note that JPG may not be the best compression method for text, because it's geared towards photos, and that means it cannot render the sharp transitions typical for text very well. On the other hand, as these are scans, the transitions may already be inherently blurred when scanning.
Note also that JPG has quite a few parameters that influence compression ratio vs. quality.
So in that case, use mutools to extract a few of the images, use some other tool (e.g. mediainfo or identify -verbose from ImageMagick/GraphicsMagick) to find out the parameters of the JPG images.
Also have a very close look at the decompressed JPG image at high magnifications, and decide if the quality is good enough.
There should be plenty of open-source tools to create a JPG file from your scanned images in the desired resolution and quality, but I don't know any tool offhand that can pack them into a PDF.