4

I am trying to copy existing pdf file into some new file using itextpdf library in Java. I am using version 5.5.10 of itextpdf. I am facing different issues with both ways : PDFStamper and PdfCopy. When I use PDFStamper class, I observe that new file size is increased by large margin, although nothing new items were added. Here is code piece :

    String currFile="C:\misc\pdffiles\AcroJS.pdf" ;
    String dest = "C:\misc\pdffiles\AcroJS_copy.pdf" ;
    PdfReader reader = new PdfReader(currFile) ;
    PdfStamper stamper = new PdfStamper(reader,new FileOutputStream(dest)) ;
    stamper.close() ;
    reader.close() ;

Some observations are : 7 MB(original) to 13 MB (Approx, new file) , 116 KB > 119 KB (Approx)

I was expecting approximate same file size when just copying existing pdf file. I am not able to figure out why size is increasing that much.

I have tried PdfCopy class as well. I Followed 2 approaches with PdfCopy:

  1. Copy page by page.
  2. Call setMergeFields() on pdfcopy object then call pdfcopy.addDocument(reader) ;

But problem in both approaches is that it is throwing away some non-content metadata from pdf file and hence new pdf is breaking when opened by Adobe reader. For example my pdf contains dictionary object PdfName.S . In this case newly created pdf file is just 2KB (original was 1.6 MB) , it clearly means nothing is copied into document and it is broken.

My original requirement is very simple : copy existing pdf to new pdf file, without increase in size, without throwing away necessary items. Obiviously It is not like, copy, paste and then rename. Because in next step, I have some processings to do with pdf content. Any help will be much appreciated.

OS : Windows 10 Pro Java : 1.8.101 itext : 5.5.10

thanks

3
  • Please share the PDF in question to allow reproducing the issue. Commented Dec 16, 2016 at 11:57
  • @mkl , please check pdf [pdfill.com/download/AcroJS.pdf] to check with PdfStamper. Its original size is around 7 MB, with PDfStamper, it is being converted to around 13 MB. For PdfCopy, Iet me first find some suitable example, the one I have right now, I cannot share it due to some reasons. Commented Dec 16, 2016 at 16:28
  • Did my answer solve your issue? Then please accept it. Or were there problems? Then please describe them in a comment. Commented Jan 2, 2017 at 11:18

1 Answer 1

5

Use of PdfStamper

Your code

Your code

PdfStamper stamper = new PdfStamper(reader,new FileOutputStream(dest)) ;
stamper.close() ;

essentially tells iText to copy the original PDF throwing away unused object and using iText's default compression settings.

iText's default compression settings include not using compressed cross reference and object streams (introduced in PDF 1.5) but the older technique of cross reference tables and individually compressed objects.

The sample file, on the other hand does use these techniques. Thus, it is much better compressed.

Code with full compression

You can tell iText to use these improved compression techniques, too, like this:

PdfReader reader = new PdfReader(resourceStream);
PdfStamper stamper = new PdfStamper(reader, outputStream);
stamper.setFullCompression();

stamper.close();

(Stamping.java test method testStampAcroJSCompressed)

This results in a file less than 4 MB in size.

Code with append mode

If you want to remain faithful to the original way objects were stored, you can instead use the append mode which identically copies the original file and adds changes in the form of a so called incremental update, like this:

PdfReader reader = new PdfReader(resourceStream);
PdfStamper stamper = new PdfStamper(reader, outputStream, '\0', true);

stamper.close();

(Stamping.java test method testStampAcroJSAppended)

This results in a file slightly larger than the original file.

Use of PdfCopy

You observed that PdfCopy

is throwing away some non-content metadata

Of course it does. PdfCopy is designed to copy pages from one PDF to another, keeping content and annotations as they were but ignoring other page-level and all document-level information.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks a lot @mkl . Approaches suggested by you for PdfStamper solved my problem ! I really appreciate simple explanations for solution and github link for the same. I was thinking, if you have something for apache PdfBox for this same problem. Thanks.
@psp "if you have something for apache PdfBox for this same problem" - I'm not aware of an option of Apache PDFBox which allows the equivalent of full compression. And saving a change as incremental update (which iText does in append mode) with PDFBox is a PITA for anything but the special case of signing.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.