1

I have a desktop computer (Intel i4770) running Oracle Linux 7.9 with kernel 4.1.12-61. I usually keep it off and only turn it on on the rare occasions when I need to test something. A month or so ago, I turned it on and noticed that the fans were on max speed - I checked top and found that setroubleshoot was at 100% so I killed the process. The process kept coming back and I kept killing it but ultimately, it didn't matter much because my testing was done and I turned the computer off again. (Yes, I always shut it down the right way.)

Now trying to get to the root of the problem, setroubleshoot is no longer showing 100% in top. In fact, nothing is even close to 100%. Running htop, I can get details about the CPUs and 4 of the 8 cores are permanently 100%. From the time the computer lets me log in to when I shut it down. But there's nothing in the list of processes even above 5.2%.
htop showing 4 * 100% with no processes that would contribute to that

When I run perf on each core with perf top -C 1 --sort comm, I can see that cores zero through 3 are all 100% kernel.
perf for single CPU

Here is the perf report from running perf record -a -F 999 -- sleep 10. I don't know if the failure to find useful symbols is indicative of the problem I'm chasing, if it is a different issue that I'll need help figure out, or if it is something that should be ignored.
perf-report

On the desktop of this computer, I noticed a bunch of SELinux errors. They all appear to be saying that there was an attempt to execute something that should not have been allowed.
SELinux errors

And just to confirm that htop was right about what it was reporting, here's the report from the System Monitor.
System Monitor display showing 8 cores with 4 at 100%

Booting into a prior version of the kernel didn't help. And booting into the "rescue" kernel didn't help either.

I tried updating the kernel but that didn't help. I ran a software update and that didn't help either. Note that I hadn't done any updates or installed any new software immediately prior to this problem starting. This install had been stable for years when I needed it.

I also tried installing the same OS over again on a new external drive. That worked. No issues on that drive. But when I boot to that drive and then choose the kernel that is on the main drive, the problem returns. That all seems to prove that the kernel isn't the issue but the system main drive has something wrong.

I'm at a loss for how to debug further. I can't figure out what changed and why so I don't know how to even start fixing it. Any help about where to look and what to check would be appreciated!

___ Edit 1: ___

Output from ps -efl|sort -rk14|head:

F S UID        PID  PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY       TIME CMD
0 S gdm       2085  2041  2  80   0 - 905731 -     17:42 ?        00:00:02 /usr/bin/gnome-shell
4 S root         1     0  1  80   0 - 54811 -      17:41 ?        00:00:01 /usr/lib/systemd/systemd --switched-root --system --deserialize 22
4 S root       837     1  1  80   0 - 22671 -      17:42 ?        00:00:01 /sbin/rngd -f
1 S root       405     2  0  60 -20 -     0 -      17:41 ?        00:00:00 [xfs_mru_cache]
1 S root       407     2  0  60 -20 -     0 -      17:41 ?        00:00:00 [xfs-data/sda1]
1 S root       408     2  0  60 -20 -     0 -      17:41 ?        00:00:00 [xfs-conv/sda1]
1 S root       409     2  0  60 -20 -     0 -      17:41 ?        00:00:00 [xfs-cil/sda1]
1 S root       406     2  0  60 -20 -     0 -      17:41 ?        00:00:00 [xfs-buf/sda1]
1 S root       404     2  0  60 -20 -     0 -      17:41 ?        00:00:00 [xfsalloc]

output from dmesg | grep libsystem

[    3.344918] audit: type=1400 audit(1731105719.768:4): avc:  denied  { execute } for  pid=496 comm="systemd-journal" path="/usr/local/lib/libsystem.so" dev="sda1" ino=136977390 scontext=system_u:system_r:syslogd_t:s0 tcontext=unconfined_u:object_r:user_tmp_t:s0 tclass=file permissive=0
[    3.351928] audit: type=1400 audit(1731105719.775:6): avc:  denied  { execute } for  pid=502 comm="systemd-readahe" path="/usr/local/lib/libsystem.so" dev="sda1" ino=136977390 scontext=system_u:system_r:readahead_t:s0 tcontext=unconfined_u:object_r:user_tmp_t:s0 tclass=file permissive=0
[    3.351929] audit: type=1400 audit(1731105719.775:5): avc:  denied  { execute } for  pid=503 comm="systemd-readahe" path="/usr/local/lib/libsystem.so" dev="sda1" ino=136977390 scontext=system_u:system_r:readahead_t:s0 tcontext=unconfined_u:object_r:user_tmp_t:s0 tclass=file permissive=0
[    3.374010] audit: type=1400 audit(1731105719.797:7): avc:  denied  { execute } for  pid=513 comm="systemd-tmpfile" path="/usr/local/lib/libsystem.so" dev="sda1" ino=136977390 scontext=system_u:system_r:systemd_tmpfiles_t:s0 tcontext=unconfined_u:object_r:user_tmp_t:s0 tclass=file permissive=0
[    3.386383] audit: type=1400 audit(1731105719.810:8): avc:  denied  { execute } for  pid=525 comm="systemd-sysctl" path="/usr/local/lib/libsystem.so" dev="sda1" ino=136977390 scontext=system_u:system_r:systemd_sysctl_t:s0 tcontext=unconfined_u:object_r:user_tmp_t:s0 tclass=file permissive=0
[    3.397457] audit: type=1400 audit(1731105719.821:9): avc:  denied  { execute } for  pid=536 comm="hostname" path="/usr/local/lib/libsystem.so" dev="sda1" ino=136977390 scontext=system_u:system_r:hostname_t:s0 tcontext=unconfined_u:object_r:user_tmp_t:s0 tclass=file permissive=0
[    3.673352] audit: type=1400 audit(1731105720.097:10): avc:  denied  { execute } for  pid=657 comm="alsactl" path="/usr/local/lib/libsystem.so" dev="sda1" ino=136977390 scontext=system_u:system_r:alsa_t:s0-s0:c0.c1023 tcontext=unconfined_u:object_r:user_tmp_t:s0 tclass=file permissive=0

___ Edit 2: ___

I installed kernel debug info and downgraded perf to version 3 since apparently there is a bug with the perf version for OL7.9. However there are still symbols that can't be found.
perf-report2

The number in the list above, 1399, is the PID for the process but that process isn't visible either through htop or ps. As soon as I did a kill -9 1399, the CPU usage immediately dropped to zero. That's nice because at least the problem process is now dead. And I know how to kill it, even though I don't see it in the normal process lists.

But the fundamental question remains - where is this process coming from and how do I stop it from starting in the first place!?

9
  • 1
    Where did /usr/local/lib/libsystem.so come from? I've never used Oracle Linux, so am not sure how common it is to have OS provided things under local, but on any other distro I'd assume it's something you compiled from source. It might make sense to check current logs and the output of dmesg for irregularities. Commented Nov 8, 2024 at 20:22
  • 1
    Can you provide the output (as text, not image) of ` ps -efl|sort -rk14|head`? Commented Nov 8, 2024 at 20:37
  • 1
    when you install kernel debug symbols, you will get a much nicer perf output Commented Nov 8, 2024 at 20:45
  • 1
    @ktbos it shouldn't be hard; docs.oracle.com/en/operating-systems/oracle-linux/9/monitoring/… Commented Nov 8, 2024 at 22:54
  • 1
    @MarcusMüller, I have now added the debug symbols but as you'll see in the above edit, the problem process didn't have symbols to share which I think is part of the problem. Commented Nov 9, 2024 at 11:03

1 Answer 1

0

In a word, hacked. Thanks to everyone who contributed here (and in the prior location for the question on serverfault) for helping me figure it out.

It started with @tink's point about it being weird that libsystem.so was in /usr/local/lib. Looking more closely at that file, I found that the date was from 2005, the the owner and group were not root, and that the i and a attributes had been set on it. That was enough red flags for me to start looking into this situation as less about what happened and more about what did an intruder do.

Searching for what might be using that file, I found it in /etc/ld.so.preload which also shared the same owner and attributes as libsystem and had a different really old date. The only thing that file did was call libsystem.so.

With both of those removed and rebooting, there was no change to the server load, but now the process that was slamming the CPUs was visible. And working with the perf as @JohnMahowald recommended (before this post was moved from serverfault) , I was still able to see that the process using up the CPUs matched what showed in the report. But the symbols weren't showing yet, despite having enabled the debuginfo kernel, as suggested by @MarcusMüller. As mentioned at the end of the question, I could still kill the process that was visible even though it was showing as "kernel".

Searching through more of the system, I eventually found the real culprit. It was an executable named "kernel" that lived in /etc/. Wait a minute... that means that all the time I was trying to add symbols to explain what the kernel was doing, I wasn't going to find them because wasn't the real kernel that was maxing out the CPUs - it was a fake kernel. Or as I realized later, xmrig that was probably just renamed as "kernel".

The next step was to find out what was launching "kernel" and it took a while to find that. It was in /lib/systemd/system/systemboot.service. Having that file in that directory was enough to cause it to launch "kernel".

Now with all of those components removed, the system is working well. I have now since discovered in an old log file, that had rotated out, evidence of "miner" entries which is what led to the conclusion about it being an xmrig cryptojack. And although I still don't know how the intruder gained access, I have closed off any of the ways they might have used.

There are some more specifics that I could share but I don't want to help the criminals out. But at the same time, given that this kind of thing is likely to happen to others (I'm sure I'm not special in being a target of this particular variant of xmrig cryptojacking), hopefully the info in the question and the comments and here in this answer are enough to resolve the problem for somebody else.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.