PDF Compression Techniques

Question

I am trying to compress PDF document in Java. The original file size is 1.5-2 MB and we need to bring it down to less than 1 MB. I tried using iText compression on it, however the results are not that effective and file size is still greater than 1 MB.

byte[] mergedFileContent = byteArrayOS.toByteArray();
reader = new PdfReader(mergedFileContent);
PdfStamper stamper = new PdfStamper(reader, byteArrOScomp);
stamper.setFullCompression();
stamper.close();
reader.close();

Has anyone worked on something similar? Any inputs would be appreciated.

What media / images does your pdf contain and how do you intend to compress them? — Jankapunkt
– Jankapunkt, Commented May 4, 2016 at 6:10
@Jankapunkt The pdf basically consists of text and tabular formatting on it. We don't have any high quality images being rendered on the pdf. I am just looking to reduce it to a size which could be less than 1 MB. — Nishant
– Nishant, Commented May 4, 2016 at 6:45
Are you bound to pdf stamper? As I can see it, it is not very clear in it's documentation what compression algorithm it uses. The Adobe PDF specifications shows different possible compression algorithms and methods. So if your lib does not offer you to choose from different compression methods you may switch to another lib. — Jankapunkt
– Jankapunkt, Commented May 4, 2016 at 6:57
You cannot set some random target size and expect software to compress to that size (or less). If it were possible, all compressed files should end up as 1 byte. See data-compression.com/theory.html. — Jongware
– Jongware, Commented May 4, 2016 at 9:26
@Nishant no offence but this should be part of your research competencies. — Jankapunkt
– Jankapunkt, Commented May 4, 2016 at 10:16

mkl · Accepted Answer · 2016-05-04 13:35:31Z

You might want to look into the official iText examples, in particular the sample HelloWorldCompression is about applying different degrees of compression both at initial PDF creation time and as a post-processing step.

The following method from that sample may help you along.

public void compressPdf(String src, String dest) throws IOException, DocumentException {
    PdfReader reader = new PdfReader(src);
    PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest), PdfWriter.VERSION_1_5);
    stamper.getWriter().setCompressionLevel(9);
    int total = reader.getNumberOfPages() + 1;
    for (int i = 1; i < total; i++) {
        reader.setPageContent(i, reader.getPageContent(i));
    }
    stamper.setFullCompression();
    stamper.close();
    reader.close();
}

If you wonder how I found it: I googled for "itextpdf example full compression" and it was the second result. (The first find contains the same method but is not from the official iText site.)

The pdf is being processed using a third-party tool. And i have tried using all the different levels of compression, however stamper.setFullCompression(); still produces the best result of all.

K J · Accepted Answer · 2024-10-30 20:53:25Z

A file object that has already been compressed to its personal maximum (as many PDF may already have been coded, compacted, compressed, and "crypted") can not be made any smaller by any significant amount unless some component is removed, thus the original qualities or functions destroyed.

As an Example take a similar size file as the original (1.5 to 2MB). This one is 1.82 MB (1,916,023 bytes). The Postscript source was only 1.21 KB. So surly it should be possible to reduce towards that smaller size?

Well it is soon clear on opening the PDF is has 4096 pages and removing any single page would fail to maintain its function.

WE CAN compress it some more via say an online compressor.

Your PDF are now 17% smaller! 1.83 MB >> 1.53 MB.

Which was achieved by optimisation (number of components /Size 12293 reduced to /Size 8413) NOT by compression which is the same compression (A mix of Zip & deFlate).

I also know that can be bettered info: optimized 4096 streams, kept 4096 #orig, means there is only now one stream per page but by add one more wrapper /Size 8414 can reduce that down to a file size now of 1.03 MB (1,081,466 bytes).

Decompressed the total objects are /Size 8412 = 3.13 MB (3,286,658 bytes).

Still Functional at about 32.9%, However it will never be under the OP desired 1.00 MB without some function loss.

Halcyon · Accepted Answer · 2017-10-13 17:28:28Z

-1

You could gzip, zip, etc. the file afterwards. It isn't really a PDF compression format, but if you are constrained and want better compression then compressing the entire thing may have good results since it can compress meta-level data.

answered Oct 13, 2017 at 17:28

Halcyon

1,4491 gold badge16 silver badges25 bronze badges

Comments

ArtemGr · Accepted Answer · 2024-10-30 07:32:12Z

PDFs are already compressed in a number of ways, which prevents the external compression utilities from gaining much ground. It should be obvious that if you unpack the PDF, then the external utilities would have an easier time finding redundancies and patterns to compress.

I know of no tool to unpack the PDF without reprinting it though. Ghostscript can reprint the exiting PDF into a new PDF, and we can tell it to avoid compression in that second version.

gs -dCompressPages=false -dCompressFonts=false -dCompressStreams=false -dEncodeColorImages=false -dEncodeGrayImages=false -dEncodeMonoImages=false -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dPDFSETTINGS=/screen -dSubsetFonts=true -dColorImageResolution=96 -dGrayImageResolution=96 -sOutputFile=raw-copy.pdf src.pdf

Even though the resulting copy is large (as it uses no compression), it can be more effectively packed with external tools

zpaq a raw-copy.pdf.zpaq raw-copy.pdf -m5 -fragment 9
zstd --ultra -22 raw-copy.pdf

A useful side effect is that we can compress together different versions of a text (unpacked PDFs, unpacked EPUBs, HTMLs, DOCs, RTFs and so on), eliminating redundancy and saving storage space across formats.

Running this command does not copy bookmarks that's present in the original PDF file.
Thanks for the info. Rewriting PDFs with gs is never going to be perfect, it is clearly a lossy compression, and you should of course tweak the command to your needs. Personally I have a script which tries multiple ways of compression, allowing me to pick the best tradeoff on a case-by-case basis.

Collectives™ on Stack Overflow

PDF Compression Techniques

4 Answers 4

1 Comment

Comments

Comments

2 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

Comments

Comments

2 Comments

Linked

Related