1

I tested out echo 2 > /proc/sys/vm/overcommit_memory, which I know isn't a commonly used or recommended mode, but for various reasons it could be beneficial for some of my workloads.

However, when I tested this out on a desktop system with 15.6GiB RAM, with barely a quarter of memory used, most programs would already start crashing or erroring, and Brave would fail to open tabs:

$ dmesg
...
[24551.333140] __vm_enough_memory: pid: 19014, comm: brave, bytes: 268435456 not enough memory for the allocation
[24551.417579] __vm_enough_memory: pid: 19022, comm: brave, bytes: 268435456 not enough memory for the allocation
[24552.506934] __vm_enough_memory: pid: 19033, comm: brave, bytes: 268435456 not enough memory for the allocation
$ ./smem -tkw
Area                           Used      Cache   Noncache 
firmware/hardware                 0          0          0 
kernel image                      0          0          0 
kernel dynamic memory          4.0G       3.5G     519.5M 
userspace memory               3.4G       1.3G       2.1G 
free memory                    8.2G       8.2G          0 
----------------------------------------------------------
                              15.6G      13.0G       2.7G 

I understand that with overcommitting memory disabled, fork() instead of vfork() which many Linux suboptimally programs use, can cause issues once the process has more memory allocated. But it seems like this isn't the case here, since 1. the affected processes seem to at most use a few hundred megabytes of memory, and 2. the allocation listed in dmesg as failing is way smaller than what's listed as free, and 3. the overall system memory doesn't seem to be even a quarter filled up.

Some more system info:

# /sbin/sysctl vm.overcommit_ratio vm.overcommit_kbytes vm.admin_reserve_kbytes vm.user_reserve_kbytes
vm.overcommit_ratio = 50
vm.overcommit_kbytes = 0
vm.admin_reserve_kbytes = 8192
vm.user_reserve_kbytes = 131072

I'm therefore wondering what the cause here is. Is there some obvious reason for this, perhaps some misconfiguration on my part that could be improved?

Update: so, in part it seems to have been the overcommit_ratio that @StephenKitt helped me find, which needed adjustment like this:

echo 2 > /proc/sys/vm/overcommit_memory
echo 100 > /proc/sys/vm/overcommit_ratio

But now I seem to be running into another wall, and I first thought it would be the fork() vs vfork() issue, but instead it seems to be once app memory usage reaches the dynamic kernel memory:

enter image description here

I'm guessing it may not be intended the kernel keeps sitting on this dynamic memory of more than 6GiB forever without that being usable. Does anybody have an idea why it behaves like that with overcommitting disabled? Perhaps I'm missing something here.

Update 2:

Here's more information collected when hitting this weird condition again, where the dynamic kernel memory won't get out of the way:

[32915.298484] __vm_enough_memory: pid: 24347, comm: brave, bytes: 268435456 not enough memory for the allocation
[32916.293690] __vm_enough_memory: pid: 24355, comm: brave, bytes: 268435456 not enough memory for the allocation
# exit
~/Develop/smem $ ./smem -tkw
Area                           Used      Cache   Noncache 
firmware/hardware                 0          0          0 
kernel image                      0          0          0 
kernel dynamic memory          7.8G       7.4G     384.0M 
userspace memory               5.2G       1.5G       3.7G 
free memory                    2.7G       2.7G          0 
----------------------------------------------------------
                              15.6G      11.5G       4.1G 
~/Develop/smem $ cat /proc/sys/vm/overcommit_ratio
100
~/Develop/smem $ cat /proc/sys/vm/overcommit_memory
2
~/Develop/smem $ cat /proc/meminfo 
MemTotal:       16384932 kB
MemFree:         2803496 kB
MemAvailable:   10297132 kB
Buffers:            1796 kB
Cached:          8749580 kB
SwapCached:            0 kB
Active:          7032032 kB
Inactive:        4760088 kB
Active(anon):    4698776 kB
Inactive(anon):        0 kB
Active(file):    2333256 kB
Inactive(file):  4760088 kB
Unevictable:      825908 kB
Mlocked:            1192 kB
SwapTotal:       2097148 kB
SwapFree:        2097148 kB
Zswap:                 0 kB
Zswapped:              0 kB
Dirty:               252 kB
Writeback:             0 kB
AnonPages:       3866720 kB
Mapped:          1520696 kB
Shmem:           1658104 kB
KReclaimable:     570808 kB
Slab:             743788 kB
SReclaimable:     570808 kB
SUnreclaim:       172980 kB
KernelStack:       18720 kB
PageTables:        53772 kB
SecPageTables:         0 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    18482080 kB
Committed_AS:   17610184 kB
VmallocTotal:   261087232 kB
VmallocUsed:       86372 kB
VmallocChunk:          0 kB
Percpu:              864 kB
CmaTotal:          65536 kB
CmaFree:             608 kB
6
  • @StephenKitt turns out you got me on the right track after all, the problem was the vm.overcommit_ratio option. After adjusting it, everything works as expected. Commented Jul 11 at 6:05
  • 1
    Ah, good to know, I didn’t realise your allocation was large enough to hit that limit! Commented Jul 11 at 6:06
  • I wouldn't have thought that it was, but I suppose I understand fairly little of how exactly everything is calculated in the kernel. Commented Jul 11 at 9:30
  • 1
    Show /proc/meminfo Commented Jul 12 at 6:48
  • @AlexD I added the information you requested. Thank you for your input. Commented Jul 13 at 1:51

1 Answer 1

1

Disabling overcommit doesn't change programs' behaviour or overall kernel memory management strategy.

The only thing this setting changes is that the kernel now enforces a limit on the total memory allocation in the system. The limit is CommitLimit in /proc/meminfo, and the total memory allocation is Committed_AS.

In your case, CommitLimit = 18482080 KB, which is calculated as MemTotal * vm.overcommit_ratio + SwapTotal (16384932 KB + 2097148 KB). When a program tries to allocate memory so that Committed_AS exceeds CommitLimit, then the memory allocation fails with a message like:

vm_enough_memory: pid: 24347, comm: brave, bytes: 268435456 not enough memory for the allocation

The thing is that allocated memory doesn't equal used memory. Programs allocate more memory than they actually use. As you are aware, a program that uses 1 GB of memory, after a fork() allocates 2 GB but still uses the same 1 GB of memory immediately after the fork(), so Committed_AS is increased by 1 GB, but you still have the same MemFree. Until MemFree hits a low water mark (probably around 64 MB on your system), the kernel won't start swapping or discarding the caches. And the kernel doesn't care that you have almost 3 GB in MemFree and 10 GB in MemAvailable when you hit CommitLimit.

CommitLimit is about a different thing. It makes the kernel guarantee that the system can use all allocated memory in the future, even though some memory is unused right now, and it won't allow to allocate more memory than the limit. It is useful for some workloads where the program itself can manage its memory allocation, limit it to some configured amount, and it can handle failing malloc() gracefully.

On the modern desktop, where programmers don't care about failing malloc(), disabling overcommit is shooting yourself in the foot. As you can observe, the memory allocations start failing long before the memory is exhausted.

6
  • The failed allocation seems to be roughly 200MB. Given the output of smem, this memory should likely be available in the given situation. While I understand programs often allocate more than needed, I can't see how the output I provided is indicating that this is the concrete problem here. Rather, it seems like the allocations fail exactly as soon as the memory not used up by "kernel dynamic memory" is occupied. However, a large amount of that dynamic memory is declared by smem as something cached, which should probably be in parts evictable. This is why the situation continues to confuse me. Commented Jul 13 at 10:00
  • To quote actual numbers, the failed allocation seems to be ~200-300MiB as per dmesg. MemFree is roughly 280MiB, suggesting this might be the limit that was hit. However, Cached indicates cached potentially evictable memory of 8GiB+, and Unevictable is merely ~800MiB, and MemAvailable also indicates multiple gigabytes supposedly available. Zswap, which is configured as a way smaller volume than the 8GiB, even seems to be empty with nothing swapped yet. Given all these numbers, that the allocation simply fails isn't what I would expect. Hence I'm here, looking for answers. Commented Jul 13 at 10:08
  • 1
    I'll repeat. When you are hitting CommitLimit, all the numbers like MemFree, MemAvailable don't matter. Commented Jul 13 at 10:12
  • Thank you for clarifying. My apologies, after checking again it seems Committed_AS is indeed taking up 16GiB. I suppose I simply have to accept this is simply somehow by design not part of the other numbers, so it seems like you are correct. I'll mark you as response as such, thank you! I realize this is off-topic, but is it possible to identify the processes that are the current worst offenders with committing more as needed, in a specific moment? Commented Jul 13 at 10:15
  • 1
    You can probably get pre-process information from /proc/*/smaps, the same way smem does. You can ask a separate question about this. Commented Jul 13 at 10:25

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.