Use convert to grab a specific page from a PDF file?

Question

I know I have done this before, so I'm sure it's possible, I just forget how to do it. There's a way to tell convert to grab a specific page of a PDF, and I'd like to keep the format of that page as PDF.

Warren Young · Accepted Answer · 2011-06-10 07:59:22Z

38

You can use subscript notation with convert(1) to "index" into a PDF:

$ convert source.pdf[1] dest.pdf

The index value depends on how the PDF exporter numbered the pages. In tests on files here, the numbers seem to be zero-based, so the above example gets you the second page in the document. I've seen examples online where they show letter indexes instead, since apparently the PDF creator "numbered" the pages in that document that way instead.

Unfortunately, this doesn't give very good results, because ImageMagick assumes everything is pixel-based, and therefore rasterizes vector imagery, such as the typography in a typical PDF.

A better tool for the job is Ghostscript, which you probably already have installed:

$ gs -dNOPAUSE -dBATCH -dFirstPage=2 -dLastPage=2 -sDEVICE=pdfwrite \
    -sOutputFile=dest.pdf -f src.pdf

This passes the PDF data through unchanged, since Ghostscript understands PDF (a PostScript derivative) to a much deeper level than ImageMagick does.

edited Jun 10, 2011 at 7:59

answered Jun 9, 2011 at 1:54

Warren Young

73.4k17 gold badges182 silver badges172 bronze badges

2

actually that's not true about imagemagick, if you set the -density parameter to something around 300-400 then the outputted text from the pdf in the png will look just fine.

buggedcom
– buggedcom

2012-08-22 23:19:28 +00:00
Commented Aug 22, 2012 at 23:19
6

It'll look fine on screen, sure, but if you then go to print, you'll want to set the density even higher. And then, you're likely to run into trouble with how your printer's RIP copes with the gray antialiasing pixels output by ImageMagick. So you can then choose instead to output to 1-bit B&W at your printer's native resolution, which might be 1,200 dpi, or 1,440 dpi or something else, and you have to know that in advance to get sharp output. No, I'll stand by my statement: best to keep PDF data in vector form as long as possible.

Warren Young
– Warren Young

2012-08-23 02:21:42 +00:00
Commented Aug 23, 2012 at 2:21
@buggedcom I've found -density 300 is the sweet spot. Anything larger and you're creating huge temp files - which you're probably going to resize down to thumbnails anyway

Mike Causer
– Mike Causer

2013-12-16 03:26:41 +00:00
Commented Dec 16, 2013 at 3:26
2

You can also select a range of pages (e.g. for making a gif) like so source.pdf[3-6]

texasflood
– texasflood

2016-05-19 20:09:22 +00:00
Commented May 19, 2016 at 20:09
1

convert extracts the pages but the resulting pdf files are blurry. If you set density 300 then the resulting pdf files are huge. As @WarrenYoung pointed out it is best to use Ghostscript. Very fast and resulting file is as good as the original one.

tbaskan
– tbaskan

2022-12-06 06:59:44 +00:00
Commented Dec 6, 2022 at 6:59

| Show 1 more comment

Gilles 'SO- stop being evil' · Accepted Answer · 2011-06-09 23:45:01Z

28

ImageMagick is a tool for bitmap images, which most PDFs aren't. If you use it, it will rasterize the data, which is often not desirable.

Pdftk can extract one or more pages from a PDF file.

pdftk A=input.pdf cat A42 A43 output pages_42_43.pdf

If you have a LaTeX installation with PDFLaTeX, you can use pdfpages. There's a shell wrapper for pdfpages, pdfjam.

pdfjam -o pages_42_43.pdf input.pdf 42,43

Another possibility (overkill here, but useful for requirements more complex that one page) is Python with the PyPdf library.

#!/usr/bin/env python
import copy, sys
from pyPdf import PdfFileWriter, PdfFileReader
input = PdfFileReader(sys.stdin)
output = PdfFileWriter()
for i in [42, 43]:
    output.addPage(input.getPage(i))
output.write(sys.stdout)

answered Jun 9, 2011 at 23:45

Gilles 'SO- stop being evil'

865k205 gold badges1.8k silver badges2.3k bronze badges

I was about to recommend pdftk as well. You will want to use it.

Sebastian
– Sebastian

2011-06-10 08:12:38 +00:00
Commented Jun 10, 2011 at 8:12
pdfjam works like a charm, and was already installed with my LaTeX distribution. It is very easy to use.

hdl
– hdl

2016-09-09 14:12:24 +00:00
Commented Sep 9, 2016 at 14:12
Thanks a lot. The extracted page was larger than the complete pdf with pdftk so it doesn't seem to simply extract a page. The result was fine otherwise.

Eric Duminil
– Eric Duminil

2018-07-05 10:11:05 +00:00
Commented Jul 5, 2018 at 10:11

Add a comment |

user39248user39248 · Accepted Answer · 2021-05-28 16:08:45Z

3

This Q&A is from 2011. As of 2021, I think the most stable and well-maintained option for this purpose is qpdf:

qpdf input.pdf --pages . 12 -- output.pdf

Page numbering seems to start from 1, but I haven't checked how this works when the pdf file has page numbering metadata.

I did this using pdftk for many years, but pdftk is poorly engineered and depends on an obsolete version of a library.

answered May 28, 2021 at 16:08

user39248

Add a comment |

Stack Exchange Network

Use convert to grab a specific page from a PDF file?

3 Answers 3

You must log in to answer this question.

Hot Network Questions

Use convert to grab a specific page from a PDF file?

3 Answers 3

You must log in to answer this question.

Related

Hot Network Questions