33

I know I have done this before, so I'm sure it's possible, I just forget how to do it. There's a way to tell convert to grab a specific page of a PDF, and I'd like to keep the format of that page as PDF.

3 Answers 3

38

You can use subscript notation with convert(1) to "index" into a PDF:

$ convert source.pdf[1] dest.pdf 

The index value depends on how the PDF exporter numbered the pages. In tests on files here, the numbers seem to be zero-based, so the above example gets you the second page in the document. I've seen examples online where they show letter indexes instead, since apparently the PDF creator "numbered" the pages in that document that way instead.

Unfortunately, this doesn't give very good results, because ImageMagick assumes everything is pixel-based, and therefore rasterizes vector imagery, such as the typography in a typical PDF.

A better tool for the job is Ghostscript, which you probably already have installed:

$ gs -dNOPAUSE -dBATCH -dFirstPage=2 -dLastPage=2 -sDEVICE=pdfwrite \
    -sOutputFile=dest.pdf -f src.pdf

This passes the PDF data through unchanged, since Ghostscript understands PDF (a PostScript derivative) to a much deeper level than ImageMagick does.

6
  • 2
    actually that's not true about imagemagick, if you set the -density parameter to something around 300-400 then the outputted text from the pdf in the png will look just fine. Commented Aug 22, 2012 at 23:19
  • 6
    It'll look fine on screen, sure, but if you then go to print, you'll want to set the density even higher. And then, you're likely to run into trouble with how your printer's RIP copes with the gray antialiasing pixels output by ImageMagick. So you can then choose instead to output to 1-bit B&W at your printer's native resolution, which might be 1,200 dpi, or 1,440 dpi or something else, and you have to know that in advance to get sharp output. No, I'll stand by my statement: best to keep PDF data in vector form as long as possible. Commented Aug 23, 2012 at 2:21
  • @buggedcom I've found -density 300 is the sweet spot. Anything larger and you're creating huge temp files - which you're probably going to resize down to thumbnails anyway Commented Dec 16, 2013 at 3:26
  • 2
    You can also select a range of pages (e.g. for making a gif) like so source.pdf[3-6] Commented May 19, 2016 at 20:09
  • 1
    convert extracts the pages but the resulting pdf files are blurry. If you set density 300 then the resulting pdf files are huge. As @WarrenYoung pointed out it is best to use Ghostscript. Very fast and resulting file is as good as the original one. Commented Dec 6, 2022 at 6:59
28

ImageMagick is a tool for bitmap images, which most PDFs aren't. If you use it, it will rasterize the data, which is often not desirable.

Pdftk can extract one or more pages from a PDF file.

pdftk A=input.pdf cat A42 A43 output pages_42_43.pdf

If you have a LaTeX installation with PDFLaTeX, you can use pdfpages. There's a shell wrapper for pdfpages, pdfjam.

pdfjam -o pages_42_43.pdf input.pdf 42,43

Another possibility (overkill here, but useful for requirements more complex that one page) is Python with the PyPdf library.

#!/usr/bin/env python
import copy, sys
from pyPdf import PdfFileWriter, PdfFileReader
input = PdfFileReader(sys.stdin)
output = PdfFileWriter()
for i in [42, 43]:
    output.addPage(input.getPage(i))
output.write(sys.stdout)
3
  • I was about to recommend pdftk as well. You will want to use it. Commented Jun 10, 2011 at 8:12
  • pdfjam works like a charm, and was already installed with my LaTeX distribution. It is very easy to use. Commented Sep 9, 2016 at 14:12
  • Thanks a lot. The extracted page was larger than the complete pdf with pdftk so it doesn't seem to simply extract a page. The result was fine otherwise. Commented Jul 5, 2018 at 10:11
3

This Q&A is from 2011. As of 2021, I think the most stable and well-maintained option for this purpose is qpdf:

qpdf input.pdf --pages . 12 -- output.pdf

Page numbering seems to start from 1, but I haven't checked how this works when the pdf file has page numbering metadata.

I did this using pdftk for many years, but pdftk is poorly engineered and depends on an obsolete version of a library.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.