I want to create a script that takes a PDF file as input and takes a screenshot of every single page in actual size(100%). So it would look like this: custom_pdfcapture example.PDF After execution the screenshots should lie in the same directory as example.PDF with their filenames following the format ${name_of_pdf}_${page_number}.pdf. What package(s)/command(s) should I look into to do this task?
1 Answer
The pdfseparate tool of the poppler-utils package can extract single pages of an input pdf file.
Example:
pdfseparate example.pdf example_%02d.pdf
splits example.pdf into pages example_01.pdf, example_02.pdf, ... where %02d represents the page number in printf-style.
The tools pdftocairo and pdftoppm of the poppler-utils package can be used to create images of an input pdf file.
Examples:
pdftocairo -r 300 -png example.pdf
pdftocairo -scale-to-x 800 -scale-to-y -1 -png example.pdf
Both commands render all pages of the given document as PNG images named example-01.png, example-02.png, ...
The first command sets x and y resolution to 300 PPI (default is 150 PPI), the second sets the output width to fixed 800 pixels (-scale-to-x 800) and output height is determined by the aspect ratio (-scale-to-y -1).
You could use -jpeg or -tiff instead of -png to generate JPEG (see -jpegopt to change the JPEG compression level) or TIFF images.
If your document's MediaBox is greater than its CropBox (what Acrobat would display and print), add option -cropbox.
You can check the box sizes with pdfinfo which is also included in the package:
pdfinfo -box example.pdf
The pdftoppm utility which uses a different rendering backend needs a prefix for the output image (here example), the output is similar:
pdftoppm -r 300 -png example.pdf example
Please check the man pages of both commands for further options.
script that takes screenshot, butscript that extractspdfseparate example.pdf example_%d.pdf