7

I do start a long-running background process within a Bash script. I save the PID number inside a variable after sending the process to background and I use that PID number to kill that process when necessary.

However, if that background process terminates somehow before my script kills it and system assigns the same PID number to a newly created process, when I use that number to kill that background process, this action would probably kill that newly created process (depending on permissions, of course).

A used PID number would not be assigned to any newly created process in a short time, I'm aware of that, but my script is running for weeks, so it's possible.

How can I prevent such an accident from happening?

8
  • 3
    You don't have to kill it with the command kill PID, just try the more explicit pkill <name> command. Would this solve your problem? Commented Sep 20, 2020 at 23:38
  • 2
    You can run the background process with some special environment variable. Then on Linux, check the environment variable via /proc/PID/environ, or maybe just checking /proc/PID/exe is sufficient? Commented Sep 21, 2020 at 0:53
  • 1
    What operating system(s) do you need to support? Commented Sep 21, 2020 at 0:54
  • 1
    @Mikel It needs to be accomplished on Linux only. Commented Sep 21, 2020 at 8:58
  • 2
    @Mikel You are suggesting labeling each background process with a unique ID through an environment variable? That's clever. Why don't you move your comment to an answer? Commented Sep 21, 2020 at 10:01

6 Answers 6

3

As suggested in the comments, the pkill utility might be of use.

Since you say "bash script" you would most likely have to run pkill bash - And that is something you shouldn't do.

Instead, you can use pkill -f <name>, which will use the full process name to match. So assuming your task is bash /home/me/my_script.sh, you can use the following:

pkill -f -e my_script.sh

The -e is optional and simply prints out what is killed.


Alternative:

Save the following script as/usr/bin/mykill (or anywhere you want):

#!/bin/bash
mypid="$1"
if [[ ! -f /proc/$mypid/cmdline ]]; then
    echo "Process ID not found."
    exit 1
else
    echo "About to kill $(cat /proc/$mypid/cmdline)"
    echo "Press enter if you want to kill that process"
    read -p "Press CTRL-C if you don't want that"
    kill $mypid
fi

And run it as mykill <pid>

1
2

If your background process is under your control, add extra identification to its command line as a label, which you can keep a copy of alongside the Pid, and later check in ps -o args myPid.

I use an option like --unique "${myTag}"

I derive myTag from either uuidgen, or a date to nanosecond accuracy. If it is an ssh job, include the local hostname.

If you cannot introduce a new option:

.. Use date +%s to get the start time of the job, and store with the Pid.

.. Use ps -o etimes to get the elapsed time in seconds of the process.

.. Compare with the current date +%s (probably with a few seconds tolerance).

Either method, in conjunction with the Pid, should have negligible probability of error.

2
  • 1
    Only weakness of the "checking the run time solution" might be that if the date settings of the computer is altered (by intention) during the waiting period, the parent script would believe that the PID doesn't belong to itself. Commented Sep 21, 2020 at 9:44
  • 1
    @ceremcem I learned recently that VMs can run the clock in slo-mo. I use NTP and avoid VMs so I tend to trust my date output. An uncooperative child can evade all searches -- e.g. by saving its state to a file, exec () itself under a different name, and reload the state. You might monitor it every minute and record the time it disappeared, so a re-use of the Pid would be known to be a false positive. Commented Sep 21, 2020 at 21:23
1

I combine the PID with timestamp of /proc/<PID> into a uniq id, without worrying about killing wrong PID.

Save $PID:

echo $PID $(stat --format %Z /proc/$PID/comm) > pid

Kill $PID safely:

read PID TIMESTAMP < pid
[[ $(stat --format %Z /proc/$PID/comm) != $TIMESTAMP ]] || kill -SIGKILL $PID

By this way, even the $PID get recycled for another process, its creation time (timestamp of /proc/$PID/comm) will be different, so it is impossible to duplicate.

PS: the [[ ... ]] || cmd means if [[ ]] is true then do nothing, otherwise run cmd.

EDIT: We do have other many workarounds, but I think why not solve it directly? This should be the simplest direct way, no need background service, no need advanced system libararies such as systemd which is not well supported in most containers. I have used this way in a product which need interrupt other ongoing processes.

EDIT2: When you kill your background processes, you'd better start your background process in a separate process group, such as setsid background_process &, then in any sub-process of the background_process, you can get the process group id by ps -o pgid= $$, then in the killer side kill the process group id, then all sub processes will be killed atomically. Otherwise, you kill a normal process id, then its children will still be alive, and even using pkill -P parant_pid it will still have a chance that a new children escapes.

1
0

You can try jobs command to check if that job was terminated or not. The output looks like the following:

➜  ~ vim test.txt &
[1] 5634
➜  ~
[1]  + 5634 suspended (tty output)  vim test.txt
➜  ~ jobs
[1]  + suspended (tty output)  vim test.txt
➜  ~ kill -9 5634
[1]  + 5634 killed     vim test.txt
➜  ~ jobs
➜  ~

In this case the vim test.txt is in the second column and can be used to check if the process is still the same and the program is the correct one, if jobs does not return the target program you can modify your script to avoid killing the PID.

1
  • Note that OP is talking about a script, not interactive use. Commented Sep 21, 2020 at 0:51
0

I disagree that this is not possible, in general. And I would not recommend killing by-name with pkill if you can avoid it (what if you legitimately want multiple instances of some command to have individual timeouts?). I do not see how the answer regarding jobs does anything at all to eliminate the race condition; using jobs and then kill is not atomic.

However, we can use Process Group IDs (PGIDs) if job control is enabled, or use a subshell and kill by Process Parent ID (PPID) if that is not an option. See my post here: https://unix.stackexchange.com/a/649320/464414

I'll only excerpt the preferred method and paste it here; see the above post for notes and alternatives.

UPDATED: The below version now works with pipes

timeOut() {
    checkArgs() { [ $(( ${1} )) -gt 0 -a "${*:2}" ]; }
    jobControlEnabled() { expr "${-}" : '.*m' >/dev/null; }
    terminalFDs() { [ -t 0 -a -t 1 ]; }
    groupLeader() { sh -c 'expr `ps -o pgid= ${PPID}` : "${PPID}" >/dev/null;'; }
    timeOutImpl() {
        groupLeader || { echo "Job control error - not group leader!"; return -1; }
        KILL_SUB="kill -- -`sh -c 'echo ${PPID}'`"
        { sleep ${1}; ${KILL_SUB}; } &
        "${@:2}"; ${KILL_SUB}
    }
    checkArgs "${@}" || { echo "Usage: timeOut <delay> <command>"; return -1; }
    if jobControlEnabled && terminalFDs; then
        ( timeOutImpl "${@}"; )
    else
        ( set -m; ( timeOutImpl "${@}"; ); )
    fi
}
0

I realize this problem is likely not an issue anymore, but I did not find the presented alternatives very appealing, while there were some suggestions that I found very good, but not explored in depth.

First. How are pids assigned?

Pids are generated incrementally, looping back. So for the pid to be reassigned, a good part of the pid-space must be traversed (depends on how many processes are running).

This, at least, gives a bit of breathing time.

https://www.baeldung.com/linux/process-id

Second. If there is a really long time between querying the pid and killing it, AND there is sufficient process creation activity on the system compared to this gap in time, it is plausible, that the pid could have died and reused.

While I don't know of any atomic "search and kill" operation, the fact that pids are assigned in order gives a little breathing space.

Your best bet is to query if the process is still the same? For that, you can:

  • add artificial markers to the command line, like some suggested
  • use pkill --pid $MYPID --pgroup $MYPGID
  • use pkill --pid $MYPID --parent $MYPARENT -> this is my choice

Managing artificial markers is an extra hassle in my view, but using a UUID is probably the safest.

The process group ID may not exist, but a pretty good bet, although it may contain too many processes, especially if there are many backgrounded payload processes.

Since the backgrounded process is started from a parent script, the parent pid is a given. There is a chance that after loopback, the same parentpid-pid pair comes up, but the chances are pretty slim. Just for a very rough estimate, probably on the order of 1/65535/65535, that is about every 7 years, if a process is created every millisecond. With a lot of machines running for a long time, this is not a good solution. But it really depends on how the parent process behaves, if it stays around, and spawns a ton of new processes, the numbers can be very different.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.