3

From one of my scripts, I called find command, as a normal user (not root).
It was not returning/continuing, so I killed the script and find is still running.
At htop I see it is always using 100% of one core (4 cores here).
The core at 100% changes from time to time btw.
At htop, its state is 'R' (running), won't change after kill signals below.

I have tried: SIGKILL, SIGSTOP, SIGTERM, SIGABRT, hup, 15, none works.
Neither using sudo.

I tried also all possible kill signals:

astr=(`kill -l |grep "..[)]" -o |tr -d ')'`)
for str in "${astr[@]}"; do echo "======== $str";kill -$str 2315444;ps -o pid,stat,status,state,pcpu,cmd -p 2315444;sleep 1;done

but after each, the result is always the same:

PID STAT STATUS S %CPU CMD
2315444 RN        - R 99.5 find

apparmor is running but find is not listed on it (after checking), but stopping it didn't work either. SELinux is not running and I found yet no way to check for LSM here yet.

thinking about this I tried to forcefully umount the partition it was running at (what would cause no problem), and after doing so, find was still running.

What else can I try, other than reboot?
There is nothing special at dmesg either. Could it be a hardware failure? or a kernel bug?

I think it could have happened with any other process, not sure though. Maybe it is related to process that does hard drive IO?

OS.: Ubuntu 16.04

17
  • Have you tried kill -9? Commented Aug 17, 2017 at 19:31
  • 1
    @Jesse_b thats -SIGKILL at kill -l :) Commented Aug 17, 2017 at 19:33
  • Is it running in it's own PGID or is it still under the PGID of your script? If the latter, have you tried to kill -9 the whole process group? Commented Aug 17, 2017 at 19:37
  • Can you strace it to maybe see what it's doing? Commented Aug 17, 2017 at 19:41
  • 2
    Is this an epidemic? If many people are having the same problem all of a sudden, maybe a recent kernel upgrade is buggy? (Unlikely though to have the same bug on RHEL and Ubuntu as they use different kernel versions.) Commented Aug 17, 2017 at 23:25

1 Answer 1

3

I can avoid rebooting using the commands below:

sudo cgcreate -g cpu:/cpulimited
sudo cgclassify -g cpu:cpulimited 2315444 #the `find` pid
cd /sys/fs/cgroup/cpu/cpulimited
echo 1000000 |sudo tee cpu.cfs_period_us
echo 1000 |sudo tee cpu.cfs_quota_us #cant be less than 1000 as I tested

read the full explanation for cpu.cfs_quota_us at here, from this tip

The cgroup magic works on such unkillable process!

Despite ps shows pcpu as 98%, all other system monitors shows that such process is using about nothing of the cpu, like htop, top and the "system monitor" application.
So now, the machine usage is smooth again, as that single process always at 100% was making it, on intermitent intervals, slow to a halt for a second.

An answer, concerning other ways than kill to end such process, would still be better tho.

thx u all the tips!

4
  • Nice workaround! Commented Aug 17, 2017 at 23:25
  • RedHat documentation link is 404, found this however: stackoverflow.com/a/55914709/130638 Commented Sep 28, 2021 at 11:17
  • How does one revert this change once the process goes away? Commented Sep 28, 2021 at 11:28
  • @balupton cgdelete? but if I am not wrong it will vanish on reboot too. Commented Sep 29, 2021 at 5:01

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.