0

How can I best debug the following issue from here on?

I have a

  • python application
  • running in a podman container which has
  • plenty disk I/O due to a numpy memmap of two huge files, one input file which I read in chunks, process the chunk and write the result in the other mem mapped file.

I noticed that running the application leads to

  • increasing total RAM usage, eventually leading to
  • OOM situations on my system.

I tried finding the cause but

  • couldn't find references in my python code which would leak memory.

According to sudo htop

  • the application itself does not consume the memory when accounting for resident memory size only, which it defines as text sections + data sections + stack usage.

  • Mem 11.0G / 14.6G

  • 18.7 MEM% (of ~16G RAM)

later:

  • Mem 14.3 / 14.6G
  • 0.1 MEM% (of ~16G RAM)

According to ps auwx:

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
user 9874 44.9 0.0 14651602572 8092 pts/0 DNsl+ 10:40 275:13 /root/.local/share/virtualenvs/app-4PlAip0Q/bin/python main.py

# /proc/meminfo
MemTotal:       15258424 kB
MemFree:          143284 kB
MemAvailable:          0 kB
Buffers:             304 kB
Cached:            51496 kB
SwapCached:         1380 kB
Active:              812 kB
Inactive:           5800 kB
Active(anon):          0 kB
Inactive(anon):     1492 kB
Active(file):        812 kB
Inactive(file):     4308 kB
Unevictable:       46068 kB
Mlocked:               0 kB
SwapTotal:      67108860 kB
SwapFree:       66860540 kB
Zswap:                 0 kB
Zswapped:              0 kB
Dirty:               144 kB
Writeback:           124 kB
AnonPages:          1276 kB
Mapped:             2916 kB
Shmem:             46020 kB
KReclaimable:      40196 kB
Slab:            1482400 kB
SReclaimable:      40196 kB
SUnreclaim:      1442204 kB
KernelStack:        3984 kB
PageTables:     13424812 kB
SecPageTables:         0 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    74737048 kB
Committed_AS:    1396560 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       54808 kB
VmallocChunk:          0 kB
Percpu:             2496 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
HugePages_Total:       1
HugePages_Free:        1
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:            2048 kB
DirectMap4k:      464816 kB
DirectMap2M:    14129152 kB
DirectMap1G:     2097152 kB

Screenshots

htop htop

2
  • The process is in a cgroup with /sys/fs/cgroup/ramlimit/memory.max set to 10737418240 (10G) which somehow has no effect. When I add cpu time constraints to the ramlimit group, that works though, so I assume I use the cgroupv2 correctly. Commented May 24, 2024 at 2:16
  • Initially I created the container with a memory limit but podman warned me about falling back to cgroupfs since I had no systemd session, maybe this interferes with the cgroup I set after the fact to try fix it. Commented May 24, 2024 at 2:18

1 Answer 1

0

Probably the page table entries fill up most of the RAM, using up ~13.4 GB, according to /proc/meminfo. I can't tell why memory mapping the files leads to this situation where apparently plenty of pages have to stay mapped and won't be flushed and discarded.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.