Is there a pdf viewer that allows me to search its text by a regex expression?
In case that I haven't found one, I am thinking about extracting the text and layout from a pdf file by
less my.pdf > mytextfile
or pdftotext -layout
. In the text file, pages are separated by new form-feed character (Ctrl-L), and lines are separated by new line-feed character.
I was wondering how to find all the matches to a given pattern in the text file, and output their locations (page numbers and line numbers in each page)?