Debian GNU/Linux 11 (bullseye), grep (GNU grep) 3.6
I need find string in current directory within all files (doc, docx and pdf), grep command not working for me:
grep -ril "word" .
It doesn't output anything. What's wrong?
All three formats need to be converted to text before they can be searched using tools such as grep.
For “old-style” .doc files, use catdoc:
catdoc file.doc | grep word
For OOXML .docx files, use docx2txt:
docx2txt < file.docx | grep word
or
docx2txt file.docx - | grep word
For PDF files, use pdfgrep:
pdfgrep word file.pdf
or pdftotext:
pdftotext file.pdf - | grep word
If you switch to ripgrep you can use a preprocessor:
#!/bin/sh -
if [ ! -s "$1" ]; then exec cat; fi
case "$1" in
*.pdf)
exec pdftotext - -
;;
*.doc)
exec catdoc -
;;
*.docx)
exec docx2txt - -
;;
*)
exec cat
;;
esac
Save this to a file, make it executable (chmod 755), and use it with --pre:
rg --pre /path/to/preprocessor word
See the ripgrep guide for tips on reducing the overhead of the preprocessor.