3

I know, this is the 1000th question about the same topic. I've spent days reading so many threads, but I still could not find an answer to my weird situation.

When I did have 16GB of RAM on my machine, everything was fine. I did upgrade to 32GB, and the kernel did start eating all the RAM it could (~25GB), giving no space left for applications. free and atop reports this memory usage as buffer/cache.
Cleaning the cache gives me few GB back, but only for few minutes. I did try to close all applications, except 2 SSH sessions. Same result. With less than 10 apps running, I still have about 20GB of buffer/cache eaten by the kernel. The only way to get rid of it is to reboot.

I'm running a Linux Mint 18 with kernel 4.4.0-79-generic. And for information, my system starts with less than 2GB of used RAM (no hungry app running in background).

Does anybody have an idea how to identify the leak ?

[SOLVED] - the cause was identified, it was because of bootchart (my comment about it is in my own answer below)

7
  • 4
    Buffer/cache is not a problem. You should only consider 'used'. Buffered and cached memory speeds up your computer but will be released if other apps need it. Commented Jun 10, 2017 at 14:15
  • 1
    If the problem is that it starts swapping a lot, like you mentioned in the other comment (but not in the question), I suspect that the real problem is somewhere else. First thing I'd do is to disable swapping (swapoff) and see what happens to your memory usage. Commented Jun 10, 2017 at 14:50
  • 2
    If you're concerned about potential kernel bugs, you should post /proc/meminfo. If the memory is tied up in slabs, there's a nice tool called slabtop. "Cached" appears to be defined as including all slabs, not just reclaimable ones - utcc.utoronto.ca/~cks/space/blog/linux/FreeAndMeminfo Commented Jun 10, 2017 at 14:51
  • Even if you're not concerned about a bug, but are asking the question of where your memory is going, you should still provide /proc/meminfo. The output of ps -e -O trs,drs,sz,rss,vsz,size will likely also be useful. Commented Jun 10, 2017 at 17:32
  • 1
    @JoeP That will only work if overcommit is disabled. If overcommit is enabled, the OOM killer is going to swing in and start killing things. Hopefully it'll kill the stress process, but it might not. Commented Jun 10, 2017 at 18:06

3 Answers 3

6

This isn't a problem - Linux is working as designed to improve performance, without impacting any applications needing RAM.

See Help! Linux ate my RAM!.

2
  • 1
    I agree that Linux is supposed to use a much RAM it can for buffers and cache, but also MUST release it for apps use. It has always been this way. But this time, it does not release memory. I have to reboot twice a day to let the system start an application, else it starts swapping as crazy and the machine is unusable for hours (hard reboot is the only way). So it DEFINITELY A HUGE PROBLEM Commented Jun 10, 2017 at 14:34
  • 3
    What you've just described is definitely a problem. But it's not caused just by Linux using memory for buffers & cache, it must be something more specific. Can you edit your question to include this information? Commented Jun 10, 2017 at 17:19
3

I did reinstall Mint 18.1, and the issue did disappear. Memory usage stays < 3GB, and not 25~30GB as earlier.

My understanding of what happened, if this can help any one else, is it was a memory leak in the kernel. I doubt that the kernel itself is leaking, but a driver could be the cause.

I found an interesting document about kernel memory leak: https://01.org/linuxgraphics/gfx-docs/drm/dev-tools/kmemleak.html. I'm not sure what versions of kernel may enable it, but stock kernel 4.10 on Ubuntu does'nt (must recompile a kernel with option enabled).

Following this idea, I did finally find the cause. Few weeks ago I did install bootchart. It does not appear as a system application. I did reinstall it, and the result is quite straight forward. Memory usage keeps slowly growing while number of processes are launched.

Same issue is described here: https://forums.linuxmint.com/viewtopic.php?t=226774

1
  • Cool! IMO this answer is unclear - this was not a leak in the kernel. See askubuntu.com/questions/762717/high-shmem-memory-usage It could have been diagnosed if someone had remembered to ask you to run df -h and look at all the tmpfs/devtmpfs :). Commented Jun 12, 2017 at 10:39
1

Cleaning the cache gives me few GB back, but only for few minutes. I did try to close all applications, except 2 SSH sessions. Same result.

As you describe it, it does sound like there's something to identify here e.g. background activity. In the second case, it's surprising that you can't clean the cache and get back to around "less than 2GB of used RAM" for a desktop session.

Remember that you can't drop dirty pages from the cache, this would cause data loss. (They should be "cleaned" by writeback, initiated after vm.dirty_writeback_centisecs, default 3 seconds). For completeness, use sync before dropping the cache.

Generally the page cache should be populated by application reads and writes. If you're a master of atop (I'm not), maybe it will enlighten you. Otherwise - iotop will show bandwidth per process. Do your best cache-clearing dance, wait for the desktop to recover, then watch what shows in iotop.

iotop -b provides a batch mode, so you don't lose the output afterwards.

For example, you could see these statistics if you have a backup configured to run at this time. (Some backup tools deliberately try to avoid using the page cache, in fear of filling it up and evicting e.g. GUI apps).

1
  • I did stop all applications, before writing this message, I did have about 10 processes running (init, ssh, bash and htop). I've spent quite a long time to stop all processes to prove it is related to the kernel, and not any kind of application. I'm using Linux from 20 years ago, and I did never see such a crazy memory use. But I'm not a specialist of how caches work. Anyway, I was wondering if there is a way to diagnose - until now, it's the blackhole. But if not, I will reinstall the machine and stop loosing time on that. Commented Jun 10, 2017 at 16:04

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.