4

I see in the web many conflicting or unclear descriptions of the memory layout of a Linux process. Usually the common diagram looks like:

enter image description here

And a common description would say that:

The data segment contains only global or static variable which have a predefined value and can be modified. Heap contains the dynamically allocated data that is stored in a memory section we refer that as heap section and this section typically starts where data segments ends.

And also:

The heap is, generally speaking, one specific memory region created by the C runtime, and managed by malloc (which in turn uses the brk and sbrk system calls to grow and shrink).

mmap is a way of creating new memory regions, independently of malloc (and so independently of the heap). munmap is simply its inverse, it releases these regions.

Many of the those explanations seem outdated, and I find many discrepancies. For instance, many articles - as the answer above - claim that the heap is used my malloc, but this is actualy a library call that's using either sbrk or mmap, as the malloc man page says:

Normally, malloc() allocates memory from the heap, and adjusts the size of the heap as required, using sbrk(2). When allocating blocks of memory larger than MMAP_THRESHOLD bytes, the glibc malloc() implementation allocates the memory as a private anonymous mapping using mmap(2).

So if malloc in many cases in implemented by mmap, what's the difference between the heap and and the mmap area?

Another thing that seems like a contradiction is that many articles (as the malloc man page itself) claim that brk/sbrk adjust the size of the heap, but their man page says it actually adjust the size of the data segment:

brk() and sbrk() change the location of the program break, which defines the end of the process's data segment (i.e., the program break is the first location after the end of the uninitialized data segment).

So I'm trying to get a clear, up-to-date overall explanation of the memory layout of processes nowadays with the different segments, that also addresses those questions:

  1. What is the difference between the heap and the mmap areas? (From some tests I was attempting, by looking at the addresses I got from mmap and comparing to the range of the heap in /proc/self/maps, it seems that some mmap allocated pages are actually allocated inside the heap segment.)
  2. Does the break signifies the end of the data segment, or the end of the heap?

Other related questions:

2
  • This question seems better suited for stack overflow. Commented Feb 13, 2024 at 14:03
  • 2
    @td211: I disagree.  Stack Overflow is specifically about programming.  While this question is framed in terms of Linux, and has had the [linux-kernel] tag added by a reviewer, it is really more about architecture, concepts and terminology.  It might fit on Computer Science Stack Exchange. Commented Feb 13, 2024 at 15:52

1 Answer 1

1

Let's check the diagram in your post, it gives those terms the narrowest definition, from low address to high: text, data, bss, heap,(break pointer), mmap area, stack.

The size of data and bss is fixed (defined in ELF file)!

Then let's clarify the other definition of these terms: heap: sometimes refers to data+bss+heap data segment: sometimes refers to data+bss+heap (like the above definition of heap)

allocation: malloc: sometimes in heap, sometimes in mmap area (> or < MMAP_THRESHOLD) mmap: in mmap area

You said mmap could be allocated in heap, are you sure ? since I've never heard or seen that.

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.