4

With cgroups v1, it is possible to listen for events about memory pressure. According to the docs, one needs to

  • Create a new eventfd
  • Open memory.pressure_level for reading
  • Open cgroup.event_control for writing
  • Write {eventfd} {pressure_level_fd} {level} (where level is low, medium, or critical) to event_control
  • Wait until reading from the eventfd returns 8 bytes

When doing so with a program that's about to run out of memory, you'll receive a long train of low events, then a few medium and critical, before finally the OOM killer will run. If you want to convince yourself of this, I've prepared a little Rust example, you can execute it with cargo build --release --examples && sudo target/release/examples/cv1.

For cgroups v2 (docs), similar events can be received by

  • Setting up an inotify watch on memory.events.local
  • Reading and parsing the file fully, comparing numbers after each event received

This works (even without root, unlike v1), when setting a memory limit in the cgroup, and you'll usually receive at least a few inotify events with increases either in high or max. (Again, if you want to convince yourself of this, run systemd-run --same-dir --pty --user -p MemoryMax=1G cargo run --example cv2 on the above gist.)

However, when there's no memory limit set, or the limit is higher than the available memory, the process will be killed without events received. Looking at memory.pressure shows a strong increase, so the kernel definitely knows that something is up before it invokes the OOM killer. Is there a way to get it to tell us, with a nice behavior like cgroups v1 that gives lots of warnings up ahead?

Note: I'm aware of some related questions (1, 2), but:

  • They're old and questions/answers only consider cgroups v1
  • I'd like to be triggered before the oom killer becomes active, so that hack with "spawning a high oom_score_adj canary process" is out.

2 Answers 2

1

After reading the docs a bit more carefully, I found an answer. To summarize, you can watch for memory pressure by writing some 50000 2000000\0 to an fd of /proc/pressure/memory and then polling the fd for PRI events.

I'm not marking it as accepted, since I'm not happy with it:

  • The cgroups-based variant of this (at $CGROUP_FS/$GROUP/memory.pressure) is only available to root - if you use the global variant at /proc/pressure/, you might receive events even though your process still has lots of memory available.
  • The minimum interval for non-root users is 2 seconds - on systems without swap, this means that you might get OOM-killed before the first event.
1

The answer is to first read /proc/self/cgroup, trim the starting 0:: and trailing \n, then append that path to /sys/fs/cgroup and append to that /memory.pressure. It'll look like this: /sys/fs/cgroup${cgroup path}/memory.pressure. Now you can open the file and write a trigger as per the docs.

Sample code here: https://github.com/SUPERCILEX/clipboard-history/blob/e91b80a482e019f1c3af5b3898678ee3aa8384c1/server/src/reactor.rs#L134-L166


Edit: code from my answer to a comment so the formatting works:

let stop = Arc::new(AtomicBool::new(false));

thread::spawn({
    let stop = stop.clone();
    move || {
        let time = Instant::now();
        let mut ll = LinkedList::new();
        loop {
            if stop.load(Ordering::Relaxed) {
                break;
            }

            ll.push_back([time.elapsed(); 10000]);
        }
    }
});

// --- In the poll fd for low mem

stop.store(true, Ordering::Relaxed);
4
  • I'd like to ask you two things: 1. This works for non-root users? For me, that file is 644 owned by root. 2. You actually get events before the system comes to a crawl or the process is oom-killed? Commented Aug 19, 2024 at 0:49
  • For me this works without root, but I'm on a systemd system and maybe that matters? Here's what an actual path looks like for me: /sys/fs/cgroup/user.slice/user-1000.slice/[email protected]/app.slice/app-org.gnome.Terminal.slice/vte-spawn-135ff7ca-4723-479f-9f5c-9cb2929c5c73.scope/memory.pressure. Commented Aug 20, 2024 at 2:46
  • As for OOM jankiness, it works flawlessly. This test successfully aborts after going a few gigs into swap: see code in my original answer since I couldn't figure out formatting. Commented Aug 20, 2024 at 2:47
  • 1
    Ok I take back my comments about this working without root. It does, but you need something to have set up a cgroup that has been chowned for user access. Issue where I debugged this: github.com/SUPERCILEX/clipboard-history/issues/… Commit to fix it: github.com/SUPERCILEX/clipboard-history/commit/… Commented Aug 21, 2024 at 19:20

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.