4

Given this code:

#!/bin/bash
set -euo pipefail
function someFn() {
  local input_string="$1"
  echo "$input_string start"
  sleep 3
  echo "$input_string end"
}

function blocking() {
  someFn "FOO"
  someFn "BAR"
  someFn "BAZ"
  echo "DONE"
}

function wellBehavedParallel() {
  someFn "FOO" &
  someFn "BAR" &
  someFn "BAZ" &
  wait
  echo "DONE"
}

function subShellMadness() {
  (someFn "FOO" &) &
  (someFn "BAR" &) &
  (someFn "BAZ" &) &
  wait
  echo "DONE"
}

echo "BLOCKING_APPROACH"
blocking

echo "WEL_WORKING_PARALLEL"
wellBehavedParallel

echo "THIS DOES NOT WORK"
subShellMadness

It showcases two expected behaviors and one unexpected.

  1. The Blocking one

    Simple, executes one line, then the next, slow and boring but solid:

    BLOCKING_APPROACH
    FOO start
    FOO end
    BAR start
    BAR end
    BAZ start
    BAZ end
    DONE
    
  2. The well-behaved parallel one. All commands & are executed in parallel, the wait waits for all of them to finish, and only then does the main script progress further:

    WEL_WORKING_PARALLEL
    FOO start
    BAR start
    BAZ start
    FOO end
    BAR end
    BAZ end
    DONE
    
  3. This is (at least to me) unexpected, but I assume this is "by design" that once I use subshells, I cannot use wait in the main script any more.

    The jobs are still progressed in parallel, but I have lost all control, the main script even ends, and afterwards output is still dumped on the terminal by the subshells:

    THIS DOES NOT WORK
    FOO start
    BAR start
    DONE
    BAZ start
    philipp@DESKTOP-H0QQ2H8:~$ FOO end
    BAR end
    BAZ end
    

Is there a way from a main script to wait for subshells to finish?

I want to avoid PID-collecting solutions (I know that wait accepts PID as a parameter), yet from what I gather getting the right PID in the first place may be prone to race conditions (since $! will represent the last executed command's PID, not necessarily my command), and I fear PID-reusage could also make such an approach prone to unexpected behavior (am I waiting on my original command, or did some other process take my pid? When calling wait, I seemingly have no way of knowing).

Is there a best practice in dealing with waiting for subshells that reliably waits for them to finish?

(Not using subshells is not an option for me right now.)

8
  • This is similar to this other question, but more explicitly shows that you are running background tasks in subshells and wait outside of them. The wait must occur in the same subshell in which the asynchronous command was started. Commented Aug 9 at 13:49
  • Have you considered #4, (someFn "FOO" ) &? That gets you your subshell and a working wait Commented Aug 9 at 13:51
  • 1
    Is the explicit subshell (and doubled backgrounding) in (someFn "FOO" &) & really useful anyway? Anything you run in the background has to run in a distinct process that only has a copy of the shell environment... i.e. in a subshell. What's wrong with your middle solution? Commented Aug 9 at 13:55
  • You can only wait on your own children, so there's no danger of referring to an unrelated PID-reusing process unless you started it yourself. Commented Aug 10 at 13:01
  • 1
    Why not have your subshells wait for their children? (someFn "FOO" & wait) & etc. Commented Aug 10 at 13:04

4 Answers 4

7

This is (at least to me) unexpected, but I assume this is "by design" that once I use subshells, I cannot use wait in the main script anymore.

Yes - for likely the same reason as why you're using subshells in the first place. The subshell knows the job PID, but it cannot communicate that to its parent shell (just like it can't communicate variable changes to the parent).

And on top of that, those job processes are no longer immediate children of the parent shell (due to there being an intermediate subshell process), so the parent shell won't receive SIGCHLD for them when they exit and is not allowed to waitpid() for their PIDs either.

(When the subshell exits, its loose child processes are instead reparented to init aka pid1. In theory Bash could request to become the default parent but that's very Linux-specific.)

So as already mentioned by others, you have to background the whole subshell, using & outside of the subshell (in the context of the parent) instead of backgrounding individual commands inside of the subshell.

(someFn "FOO") &

Though the & already implies a separate process – i.e. someFn & is already a subshell – so I'm not sure if this is any different.

since $! will print the last executed command's PID, not necessarily my command

What other commands are there except yours?

$! is not system-wide; it is invented by your shell and therefore only knows about commands spawned directly by that shell. There is no global state that all processes somehow share like that.

And since you wrote the entire script yourself, everything directly run by the script is "your commands". Something else cannot inject its own commands into your shell script process, nor externally change the script's variables.

am I waiting on my original command, or did some other process took my pid? When call wait, I seemingly have no way of knowing

The PID is not freed until something has waited for it; wait is literally the operation that retrieves the result of an exited process.


Alternative idea: Have each subshell flock() a file. Implement waiting by attempting to lock the same file; if the lock succeeds then the subshell must have exited. I don't know how reliable this might be (not sure how to solve the race condition).

locks=$(mktemp -d /tmp/lockXXXXXX)
touch $locks/1
(
    exec {fd}>$locks/1 &&
    flock -x $fd &&
    do_some_stuff &
) &
sleep 0.1 # this is the ugly part of this whole idea
flock -x $locks/1 /bin/true
0
3

These subshells don't wait for their children, and that's why they finish immediately:

  (someFn "FOO" &) &
  (someFn "BAR" &) &
  (someFn "BAZ" &) &

You get the same problem even if you don't background them:

  (someFn "FOO" &)
  (someFn "BAR" &)
  (someFn "BAZ" &)

If we can change them, then it's simple to have them remain until the children have terminated:

  (someFn "FOO" & wait) &
  (someFn "BAR" & wait) &
  (someFn "BAZ" & wait) &

If that's not possible (e.g. if the commands which abandon their children are not subshells under your direct control) then we can use file locking to observe when their process trees terminate (assuming that they don't close inherited file descriptors):

    local lockfile=$(mktemp)
    exec {fd}>"$lockfile"
    flock $fd || return 1

    (someFn "FOO" &) &          # inherits locked $fd
    (someFn "BAR" &) &
    (someFn "BAZ" &) &

    exec {fd}>&-                # parent releases lock
    # now wait for all children to close $fd (by exiting, we hope)
    flock "$lockfile" rm "$lockfile"

This is based on an idea suggested in grawity's answer, somewhat refined to avoid race conditions and allowing use of unmodified child programs.

1
  • Note that the {fd}> redirection is a Bash extension. Not sure which other shells have that feature. Commented Aug 11 at 10:20
1

You can pipe the entire nested asynchronous subshell command to a cat so that that stage of the pipeline synchronizes with the completion of the starting shell:

#!/bin/bash
set -euo pipefail
function someFn() {
  local input_string="$1"
  echo "$input_string start"
  sleep 3
  echo "$input_string end"
}

function blocking() {
  someFn "FOO"
  someFn "BAR"
  someFn "BAZ"
  echo "DONE"
}

function wellBehavedParallel() {
  someFn "FOO" &
  someFn "BAR" &
  someFn "BAZ" &
  wait
  echo "DONE"
}

function subShellMadness() {
  (someFn "FOO" &) &
  (someFn "BAR" &) &
  (someFn "BAZ" &) &
  wait
  echo "DONE"
}

echo "BLOCKING_APPROACH"
blocking

echo "WEL_WORKING_PARALLEL"
wellBehavedParallel

echo "THIS DOES NOT WORK? ... OKay, it worked actually I think."
subShellMadness | cat -u # <-- here's my modification.

And this trick (i.e. piping to cat) is portable!

3
  • It's only portable (POSIX /bin/sh) if you remove the function words and revert to standard syntax for defining functions. Commented Aug 10 at 13:07
  • local is also not POSIX, though fairly common across sh implementations (though with varying behaviour between them). Commented Aug 10 at 13:16
  • Hi all. What's portable is the pipeline and the cat command. I've preserved every other aspect of OP's source code for illustration. Commented Aug 10 at 14:01
0

Based on my other related answer, I was able to make use of trap 'wait' EXIT, though I have to inject this in every subshell as well.

#!/bin/bash
set -euo pipefail

function currentTime() {
  date +'%H:%M:%S'
}

function someFn() {
  local input_string="$1"
  echo "$(currentTime) Starting: $input_string"
  for i in {1..5}; do
    echo "$(currentTime) $input_string will end in $((6-i)) seconds"
    sleep 1
  done
  echo "$(currentTime) Ending: $input_string"
}

function mixedParallel() {
  (
    trap 'wait' EXIT

    (
      trap 'wait' EXIT
      someFn "FOO" &
    ) &

    (
      trap 'wait' EXIT
      someFn "BAR" &
    ) &

    someFn "BAZ" &
  )
  wait

  echo "$(currentTime) All tasks are DONE!"
}

echo "USING A TRAP AND A MIXTURE OF BACKGROUND AND SUBSHELL COMMANDS"
mixedParallel
echo "$(currentTime) only if all commands in mixedParallel are finished this will execute!"

This runs as expected:

$ ./trap.sh
USING A TRAP AND A MIXTURE OF BACKGROUND AND SUBSHELL COMMANDS
19:05:53 Starting: BAR
19:05:53 Starting: FOO
19:05:53 Starting: BAZ
19:05:53 BAR will end in 5 seconds
19:05:53 BAZ will end in 5 seconds
19:05:53 FOO will end in 5 seconds
19:05:54 FOO will end in 4 seconds
19:05:54 BAZ will end in 4 seconds
19:05:54 BAR will end in 4 seconds
19:05:55 BAZ will end in 3 seconds
19:05:55 BAR will end in 3 seconds
19:05:55 FOO will end in 3 seconds
19:05:56 BAZ will end in 2 seconds
19:05:56 BAR will end in 2 seconds
19:05:56 FOO will end in 2 seconds
19:05:57 BAZ will end in 1 seconds
19:05:57 BAR will end in 1 seconds
19:05:57 FOO will end in 1 seconds
19:05:58 Ending: BAZ
19:05:58 Ending: FOO
19:05:58 Ending: BAR
19:05:58 All tasks are DONE!
19:05:58 only if all commands in mixedParallel are finished this will execute!
$

This is a bit of a hack, as ideally I could leave the code of the subshells as is, yet since I can control this part of my application, this is good enough for now.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.