0

Please show me how this code can get shorter:

find /home/peace/* -iname "*.pdf" -exec pdfgrep -i yello {} \; -exec cp {} /home/peace/Desktop/yello \;
find /home/peace/* -iname "*.pdf" -exec pdfgrep -i green {} \; -exec cp {} /home/peace/Desktop/green \;
find /home/peace/* -iname "*.pdf" -exec pdfgrep -i blue {} \; -exec cp {} /home/peace/Desktop/blue \;
find /home/peace/* -iname "*.pdf" -exec pdfgrep -i grey {} \; -exec cp {} /home/peace/Desktop/grey \;
find /home/peace/* -iname "*.pdf" -exec pdfgrep -i black {} \; -exec cp {} /home/peace/Desktop/black \;
find /home/peace/* -iname "*.pdf" -exec pdfgrep -i white {} \; -exec cp {} /home/peace/Desktop/white \;
3
  • 3
    what will happen when there are multiple files which contain eg. green? anyways it's completely pointless to a) run multiple find instances and b) continue to search in the file after the pattern was found, so you may experiment with something like find /some/path -name '*.pdf' -exec sh -c 'for f; do str=$(pdfgrep -iom1 "\<(yellow|green|blue|other_color)\>" "$f") && echo mv "$f" "/path/to/$str.pdf"; done' sh {} + Commented Jan 14, 2020 at 20:23
  • replace mv with cp in my command above. Commented Jan 14, 2020 at 20:38
  • generalise, parameterise, then iterate. Commented Jan 14, 2020 at 22:50

2 Answers 2

0

With zsh:

#! /bin/zsh -
cd /home/peace || exit
set -o extendedglob # for (#i)
pdf=(*.(#i)pdf(.ND)) # all pdf files case insensitive, including hidden ones
                     # in the current directory, only considering regular files.
colors=(yello green blue grey black white)

(($#pdf)) || exit 0 # exit with success if there's no pdf file (job already done)

ret=0
for color ($colors) {
  files=(${(0)"$(pdfgrep -ilZe $color -- $pdf)"}) # all files containing the color
                                                  # using NUL-delimited records
  if (($#files)) { # some files were found
    mv -i -- $files Desktop/$color/ || ret=$? # move them, record failures
    pdf=(${pdf:|files}) # remove the files from the list as we've moved them
  }
}
exit $ret

That minimizes the number of invocations of pdfgrep and the number of times the /home/peace directory is being read.

0

I can see some space for improvements.

  • first parameter for find is directory, not path with "*" at the end;

  • find does search inside sub-directories, so you want to restrict "find" to not search in sub-directories (find parameter "-maxdepth 1");

  • restrict find results to files, because if you have directory with name "something.pdf", you may not like results (find parameter "-type f");

  • cp second parameter /home/peace/Desktop/yello is a result filename? but what if pdfgrep can find "yello" in few pdf files, which one is a correct result? if "/home/peace/Desktop/yello" is a directory, you want to add "/" at the end. so I think it is a directory where we place result files.

So here we go:

find /home/peace/ -maxdepth 1 -type f -iname "*.pdf" -exec sh -c '
  for f do
    for i in yello green blue grey black white; do
      pdfgrep -iqe "$i" "$f" &&
        cp -f "$f" "/home/peace/Desktop/$i/"
    done
  done' sh {} + 

we also can add a check if result directory exists.

3
  • Thank you everybody for your quick and intimidating answers. Commented Jan 14, 2020 at 21:27
  • @Yurko: Regarding to your question: All files where yelllo is found shall be copied in the directory "yello". And actually I made a mistake because pdfgrep does not work as I expected. Due to no ocr possibility I have to use a other tool. This tool works: tesseract /home/zac/Downloads/gescanntedokument/Bild.jpg beispiel -l deu But I didn't find a search option. Is this possible to remove pdfgrep and use tesseract (with this tool I would work with JPG files) or another ocr tool? Commented Jan 14, 2020 at 21:44
  • @sun108: of course you can use another tool instead of pdfgrep. Actually I do not have pdfgrep installed here, so I just used your command assuming it is right, later it was fixed by a person who did edit my code to make it more readable. Commented Jan 15, 2020 at 13:59

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.