2

This is an example of part of a much larger bash (Fedora 30) script that shows the problem I am having

The script is supposed to traverse a directory tree and exit if it finds any filenames longer than 103 characters.

SS_NORMAL=0
JOLIET_MAX=103
MD5FILE=/tmp/blah.md5

function myExit
{
    echo Exiting $1 ...
    exit
}

function traverse
{
    find . -type f -print0 | 
       while IFS= read -r -d '' MD5_FILESPEC; do
          MD5_BASE=$(basename "$MD5_FILESPEC")
          if [ "${#MD5_BASE}" -gt "$JOLIET_MAX" ]; then
              myExit "[FAIL] Filename size ${#MD5_BASE} too long - $MD5_FILESPEC"
           fi
       done
}

traverse
date; date; date

It works fine until it finds one of those long filenames. It calls myExit and it exits the loop, but not all the way out of the script. I always see the three dates at the end of the output, and I should not.

How can I handle this?

3
  • 1
    | - the right side of a pipe is in a subshell. Ex. try echo | exit. Commented Nov 22, 2019 at 0:34
  • Sorry, but I don't understand how I should try it. Could you change my code in an answer so I can give you credit for it? Commented Nov 22, 2019 at 1:29
  • @JohnW : I think what KamilCuk wanted to say is that if you would do on the bash command line a echo|exit, you will see that it won't exit the bash either. You have to rewrite the logic of your script in a way that the exit is really done inside that process which you want to leave. Commented Nov 22, 2019 at 7:32

2 Answers 2

3

An easy way to get around the problem that the while loop runs in a subprocess is to perpetrate a reversal: have while run in the same shell, an a sub-process to generate the files.

But how can we do that, if the receiver of the filenames from find has to be on the right of the | operator? The answer is that in GNU Bash, we have a language extension called "process substitution".

Process substitution is a piece of syntax that Bash converts into some string that looks looks and behaves like a file name, and is accepted that way by a command. When the program opens and reads that file (or, in the other direction, writes it), it communicates through a pipe with another process.

Sketch of the idea:

   while IFS= read -r -d '' MD5_FILESPEC; do
      MD5_BASE=$(basename "$MD5_FILESPEC")
      if [ "${#MD5_BASE}" -gt "$JOLIET_MAX" ]; then
          myExit "[FAIL] Filename size ${#MD5_BASE} too long - $MD5_FILESPEC"
       fi
   done < <(find . -type f -print0)
   #      ^^^^^^^^^^^^^^^^^^^^^^^^^ this is the process substitution
Sign up to request clarification or add additional context in comments.

5 Comments

+1, This is a good approach too. I have never used a pipe like in OP's script, it had me a confused for a minute.
@Z4-tier, in principle, when we have a command pipe like C1 | C2 | ...| CN, the shell could choose any one of the commands to use the same process, not necessarily the leftmost one. There should be a feature for indicating which pipeline element should be in the same process.
@Kaz bash doesn't run any part of a pipeline in the same process (except that jobs | ... is a special case. I'd argue that process substitutions are a way to control this.
@Kaz - thanks, perfect answer. I need to read up more on process substitution. For people reading this answer, the last line should be done < <(find . -type f -print0)
@JohnW Ah, of course; while doesn't open a file name argument. :)
1

As @KamilCuk mentioned, the pipe from find to read in the loop means the exit occurs in a sub-shell. Which is to say, when you have find . -type f -print0 | <do some stuff>, and <do some stuff> includes a call to exit, that exit applies within the subshell created by the |. In the case of your script, myExit is calling exit which ends the while-loop on the right side of the pipe, but this does not end the parent shell in which traverse was actually running. traverse then finishes, date is executed 3 times, and the script ends.

Probably the easiest option is to use set -e at the top of your script. This is kind of a scortched earth approach: if anything returns non-zero, the script will exit.

To do this, replace the entire myExit function with this:

set -e

function myExit() {
    echo "Exiting $1 ..."
    return 1
}

Now you will get a non-zero return code when calling that function from inside the loop, and the script should exit immediately.

A more forgiving approach is to change myExit1 as above, but without adding set -e, and add a return 1 immediately after the call to myExit:

if [ "${#MD5_BASE}" -gt "$JOLIET_MAX" ]; then
    myExit "[FAIL] Filename size ${#MD5_BASE} too long - $MD5_FILESPEC"
    return 1
fi

And change the call to traverse to this:

if ! traverse; then exit; fi

Now when the if block ever fires, it will call myExit, which calls exit, which terminates the subshell, at which point traverse returns a value of 1, the if condition evaluates to true, and exit is called.

Another answer to your question suggests using process substitution, which is also good solution here. That shows the more typical pattern used for this type of loop operation:

while ... read ... ; do 
  ... 
done < (<input process>)

Finally, as matter of stylistic preference: I would probably get rid of myExit as a separate function, unless you have plans to expand it. As-is, it seems like it just creates unneeded complexity.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.