I know I have done this before, so I'm sure it's possible, I just forget how to do it. There's a way to tell convert to grab a specific page of a PDF, and I'd like to keep the format of that page as PDF.
3 Answers
You can use subscript notation with convert(1) to "index" into a PDF:
$ convert source.pdf[1] dest.pdf
The index value depends on how the PDF exporter numbered the pages. In tests on files here, the numbers seem to be zero-based, so the above example gets you the second page in the document. I've seen examples online where they show letter indexes instead, since apparently the PDF creator "numbered" the pages in that document that way instead.
Unfortunately, this doesn't give very good results, because ImageMagick assumes everything is pixel-based, and therefore rasterizes vector imagery, such as the typography in a typical PDF.
A better tool for the job is Ghostscript, which you probably already have installed:
$ gs -dNOPAUSE -dBATCH -dFirstPage=2 -dLastPage=2 -sDEVICE=pdfwrite \
-sOutputFile=dest.pdf -f src.pdf
This passes the PDF data through unchanged, since Ghostscript understands PDF (a PostScript derivative) to a much deeper level than ImageMagick does.
-
2actually that's not true about imagemagick, if you set the -density parameter to something around 300-400 then the outputted text from the pdf in the png will look just fine.buggedcom– buggedcom2012-08-22 23:19:28 +00:00Commented Aug 22, 2012 at 23:19
-
6It'll look fine on screen, sure, but if you then go to print, you'll want to set the density even higher. And then, you're likely to run into trouble with how your printer's RIP copes with the gray antialiasing pixels output by ImageMagick. So you can then choose instead to output to 1-bit B&W at your printer's native resolution, which might be 1,200 dpi, or 1,440 dpi or something else, and you have to know that in advance to get sharp output. No, I'll stand by my statement: best to keep PDF data in vector form as long as possible.Warren Young– Warren Young2012-08-23 02:21:42 +00:00Commented Aug 23, 2012 at 2:21
-
@buggedcom I've found
-density 300is the sweet spot. Anything larger and you're creating huge temp files - which you're probably going to resize down to thumbnails anywayMike Causer– Mike Causer2013-12-16 03:26:41 +00:00Commented Dec 16, 2013 at 3:26 -
2You can also select a range of pages (e.g. for making a gif) like so
source.pdf[3-6]texasflood– texasflood2016-05-19 20:09:22 +00:00Commented May 19, 2016 at 20:09 -
1
convertextracts the pages but the resulting pdf files are blurry. If you setdensity 300then the resulting pdf files are huge. As @WarrenYoung pointed out it is best to use Ghostscript. Very fast and resulting file is as good as the original one.tbaskan– tbaskan2022-12-06 06:59:44 +00:00Commented Dec 6, 2022 at 6:59
ImageMagick is a tool for bitmap images, which most PDFs aren't. If you use it, it will rasterize the data, which is often not desirable.
Pdftk can extract one or more pages from a PDF file.
pdftk A=input.pdf cat A42 A43 output pages_42_43.pdf
If you have a LaTeX installation with PDFLaTeX, you can use pdfpages. There's a shell wrapper for pdfpages, pdfjam.
pdfjam -o pages_42_43.pdf input.pdf 42,43
Another possibility (overkill here, but useful for requirements more complex that one page) is Python with the PyPdf library.
#!/usr/bin/env python
import copy, sys
from pyPdf import PdfFileWriter, PdfFileReader
input = PdfFileReader(sys.stdin)
output = PdfFileWriter()
for i in [42, 43]:
output.addPage(input.getPage(i))
output.write(sys.stdout)
-
I was about to recommend
pdftkas well. You will want to use it.Sebastian– Sebastian2011-06-10 08:12:38 +00:00Commented Jun 10, 2011 at 8:12 -
pdfjamworks like a charm, and was already installed with my LaTeX distribution. It is very easy to use.hdl– hdl2016-09-09 14:12:24 +00:00Commented Sep 9, 2016 at 14:12 -
Thanks a lot. The extracted page was larger than the complete pdf with
pdftkso it doesn't seem to simply extract a page. The result was fine otherwise.Eric Duminil– Eric Duminil2018-07-05 10:11:05 +00:00Commented Jul 5, 2018 at 10:11
This Q&A is from 2011. As of 2021, I think the most stable and well-maintained option for this purpose is qpdf:
qpdf input.pdf --pages . 12 -- output.pdf
Page numbering seems to start from 1, but I haven't checked how this works when the pdf file has page numbering metadata.
I did this using pdftk for many years, but pdftk is poorly engineered and depends on an obsolete version of a library.