Memory layout

Memory Management is one of the most important topics for a Programmer, and so understanding the Memory Layout of a C Program and Memory Layout of a Process becomes essential.

For high-level languages such as Java, Python, C#, Memory is partially managed by the language itself as it has a Garbage Collector, which deallocates and frees the allocated memory while not in use. But there is no such garbage collector in C & C++, and so the programmer must manually release the allocated memory.

The C program is first compiled and translated to an executable object file. When the executable is run, it takes the main memory area, i.e. the RAM, and the CPU runs the executable instructions.

If you are not aware of the processes involved in compiling the C program from source to binary, read C Program Compilation Process.

The Typical Memory Layout of a C Program consists of the following segments:

Command Line Arguments
Stack
Heap
Uninitialized Data Segment (BSS)
Initialized Data Segment
Text/Code Segment

layout

The above layout segments can be broadly classified into two:

Static Memory Layout – Text/Code, Data Segments
Dynamic Memory Layout – Stack & Heap

The C Program executable already contains some of the segments, and some are built dynamically at runtime.

First Let’s Discuss each segment of the Memory Layout in detail:

Static Memory Layout

The Static Memory layout consists of three segments, Text/Code segment, Initialized, and Uninitialized (bss) Data Segment. These three segments are already present in the final executable object file of the c program and are directly copied to the main memory layout.

We can use the size tool to take a look at the static memory layout of the c program executable object file.

Let’s take a look:

example c src
#include <stdlib.h>

int main() { return 0; }

command and output
size a.out

text    data     bss     dec     hex filename
1136     512       8    1656     678 a.out

Text/Code Segment

Text or Code Segment includes the machine-level instructions for the final executable object file. This section is one of the key parts of the static memory structure as it includes the program’s central logic.

The text segment in the memory structure is below the heap and the data segment. This layout is chosen to shield the Text section from overwriting if the stack or heap overflows.

In the text section of the final executable object file, we only have read and execute permissions and no write permissions. This is done to prevent accidental modifications to the corresponding assembly code.

objdump -S a.out

a.out:     file format elf64-x86-64


Disassembly of section .init:

0000000000001000 <_init>:
    1000:   f3 0f 1e fa             endbr64
    1004:   48 83 ec 08             sub    $0x8,%rsp
    1008:   48 8b 05 d9 2f 00 00    mov    0x2fd9(%rip),%rax        # 3fe8 <__gmon_start__@Base>
    100f:   48 85 c0                test   %rax,%rax
    1012:   74 02                   je     1016 <_init+0x16>
    1014:   ff d0                   call   *%rax
    1016:   48 83 c4 08             add    $0x8,%rsp
    101a:   c3                      ret 

Disassembly of section .text:

0000000000001020 <_start>:
    1020:   f3 0f 1e fa             endbr64 
    1024:   31 ed                   xor    %ebp,%ebp
    1026:   49 89 d1                mov    %rdx,%r9
    1029:   5e                      pop    %rsi
    102a:   48 89 e2                mov    %rsp,%rdx
    102d:   48 83 e4 f0             and    $0xfffffffffffffff0,%rsp
    1031:   50                      push   %rax
    1032:   54                      push   %rsp
    1033:   45 31 c0                xor    %r8d,%r8d
    1036:   31 c9                   xor    %ecx,%ecx
    1038:   48 8d 3d da 00 00 00    lea    0xda(%rip),%rdi        # 1119 <main>
    103f:   ff 15 93 2f 00 00       call   *0x2f93(%rip)        # 3fd8 <__libc_start_main@GLIBC_2.34>
    1045:   f4                      hlt    
    1046:   66 2e 0f 1f 84 00 00    cs nopw 0x0(%rax,%rax,1)
    104d:   00 00 00 

0000000000001050 <deregister_tm_clones>:
    1050:   48 8d 3d d1 2f 00 00    lea    0x2fd1(%rip),%rdi        # 4028 <__TMC_END__>
    1057:   48 8d 05 ca 2f 00 00    lea    0x2fca(%rip),%rax        # 4028 <__TMC_END__>
    105e:   48 39 f8                cmp    %rdi,%rax
    1061:   74 15                   je     1078 <deregister_tm_clones+0x28>
    1063:   48 8b 05 76 2f 00 00    mov    0x2f76(%rip),%rax        # 3fe0 <_ITM_deregisterTMCloneTable@Base>
    106a:   48 85 c0                test   %rax,%rax
    106d:   74 09                   je     1078 <deregister_tm_clones+0x28>
    106f:   ff e0                   jmp    *%rax
    1071:   0f 1f 80 00 00 00 00    nopl   0x0(%rax)
    1078:   c3                      ret    
    1079:   0f 1f 80 00 00 00 00    nopl   0x0(%rax)

0000000000001080 <register_tm_clones>:
    1080:   48 8d 3d a1 2f 00 00    lea    0x2fa1(%rip),%rdi        # 4028 <__TMC_END__>
    1087:   48 8d 35 9a 2f 00 00    lea    0x2f9a(%rip),%rsi        # 4028 <__TMC_END__>
    108e:   48 29 fe                sub    %rdi,%rsi
    1091:   48 89 f0                mov    %rsi,%rax
    1094:   48 c1 ee 3f             shr    $0x3f,%rsi
    1098:   48 c1 f8 03             sar    $0x3,%rax
    109c:   48 01 c6                add    %rax,%rsi
    109f:   48 d1 fe                sar    %rsi
    10a2:   74 14                   je     10b8 <register_tm_clones+0x38>
    10a4:   48 8b 05 45 2f 00 00    mov    0x2f45(%rip),%rax        # 3ff0 <_ITM_registerTMCloneTable@Base>
    10ab:   48 85 c0                test   %rax,%rax
    10ae:   74 08                   je     10b8 <register_tm_clones+0x38>
    10b0:   ff e0                   jmp    *%rax
    10b2:   66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
    10b8:   c3                      ret    
    10b9:   0f 1f 80 00 00 00 00    nopl   0x0(%rax)

00000000000010c0 <__do_global_dtors_aux>:
    10c0:   f3 0f 1e fa             endbr64 
    10c4:   80 3d 5d 2f 00 00 00    cmpb   $0x0,0x2f5d(%rip)        # 4028 <__TMC_END__>
    10cb:   75 33                   jne    1100 <__do_global_dtors_aux+0x40>
    10cd:   55                      push   %rbp
    10ce:   48 83 3d 22 2f 00 00    cmpq   $0x0,0x2f22(%rip)        # 3ff8 <__cxa_finalize@GLIBC_2.2.5>
    10d5:   00 
    10d6:   48 89 e5                mov    %rsp,%rbp
    10d9:   74 0d                   je     10e8 <__do_global_dtors_aux+0x28>
    10db:   48 8b 3d 3e 2f 00 00    mov    0x2f3e(%rip),%rdi        # 4020 <__dso_handle>
    10e2:   ff 15 10 2f 00 00       call   *0x2f10(%rip)        # 3ff8 <__cxa_finalize@GLIBC_2.2.5>
    10e8:   e8 63 ff ff ff          call   1050 <deregister_tm_clones>
    10ed:   c6 05 34 2f 00 00 01    movb   $0x1,0x2f34(%rip)        # 4028 <__TMC_END__>
    10f4:   5d                      pop    %rbp
    10f5:   c3                      ret    
    10f6:   66 2e 0f 1f 84 00 00    cs nopw 0x0(%rax,%rax,1)
    10fd:   00 00 00 
    1100:   c3                      ret    
    1101:   66 66 2e 0f 1f 84 00    data16 cs nopw 0x0(%rax,%rax,1)
    1108:   00 00 00 00 
    110c:   0f 1f 40 00             nopl   0x0(%rax)

0000000000001110 <frame_dummy>:
    1110:   f3 0f 1e fa             endbr64 
    1114:   e9 67 ff ff ff          jmp    1080 <register_tm_clones>

0000000000001119 <main>:
    1119:   55                      push   %rbp
    111a:   48 89 e5                mov    %rsp,%rbp
    111d:   b8 00 00 00 00          mov    $0x0,%eax
    1122:   5d                      pop    %rbp
    1123:   c3                      ret    

Disassembly of section .fini:

0000000000001124 <_fini>:
    1124:   f3 0f 1e fa             endbr64 
    1128:   48 83 ec 08             sub    $0x8,%rsp
    112c:   48 83 c4 08             add    $0x8,%rsp
    1130:   c3                      ret    

Initialized Data Segment

All initialized global and static variables are stored in this section.

The data segment has read and write permissions. This allows the program to execute and change the value of the variable in the data segment at runtime.

We add some variables to our program

#include <stdlib.h>

int number = 10;
char example = 'C';
int numbers[4] = {1,2,3,4};

int main() {
  return 0;
}

gcc main.c
size a.out
text    data     bss     dec     hex filename
1136     544       8    1688     698 a.out

After adding these variables the data segment grew.

Uninitialized Data Segment (BSS)

The Uninitialized Data Section, also known as the “bss” segment, was named after an old assembly operator that stands for “block started by the symbol“.

The BSS Segment contains all the uninitialized global variables and static variables. This segment is placed above the data segment in the memory layout.

This segment also has both the read and write permissions.

#include <stdlib.h>

int a, b, c;
char ch;

int main() { return 0; }

 size a.out
text    data     bss     dec     hex filename
1136     512      24    1672     688 a.out

This time size of the bss segment increased from 8 bytes to 24 bytes, because we declared global variables but didn’t initialize it.

Dynamic Memory Layout

This is the runtime memory of the process and exists as long as the process is running.

DML

Stack

Program execution can take place without a heap memory, but not without a stack segment. This illustrates the importance of stack memory for the execution of a program.

The stack is a region of memory in the process’s virtual address space where data is added or removed in the Last-in-First-out (LIFO) order.

A new stack-frame is added to the stack memory when a new function is invoked. The corresponding stack-frame is removed when the function returns.

One thing to note here is that every function has its own stack-frame, also known as Activation record.

The size of the stack is variable since it depends on the size of the local variables, parameters, and function calls. The Stack grows from a higher address to a lower address.

Every process has its own fixed/configurable stack memory. The stack memory is reclaimed by the OS when the process terminates.

Using the ulimit -s command, we can see the max size of stack memory in the Linux system.

ulimit -s
8192

Use ulimit -a command to list all the flags for the ulimit command.

ulimit -a

-t: cpu time (seconds)              unlimited
-f: file size (blocks)              unlimited
-d: data seg size (kbytes)          unlimited
-s: stack size (kbytes)             8192
-c: core file size (blocks)         unlimited
-m: resident set size (kbytes)      unlimited
-u: processes                       127950
-n: file descriptors                1024
-l: locked-in-memory size (kbytes)  8192
-v: address space (kbytes)          unlimited
-x: file locks                      unlimited
-i: pending signals                 127950
-q: bytes in POSIX msg queues       819200
-e: max nice                        0
-r: max rt priority                 0
-N 15: rt cpu time (microseconds)   unlimited

To find the limits of a running process in Linux, use cat /proc//limits command.

Create a C program with an infinite loop.

endless
int main() {
    while(1){}
}

Run the executable object file in the background, it will give us the process id of the process. Use the process id to get the limits of the process.

Kill the background running process, or it will run indefinitely.

./a.out&
[1] 74395
cat /proc/74395/limits
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        unlimited            unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             127950               127950               processes
Max open files            1024                 524288               files
Max locked memory         8388608              8388608              bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       127950               127950               signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us

Let’s find the Stack Size using C Program.

#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/resource.h>

int a, b, c;
char ch;

int main() {
  struct rlimit lim;
  if (getrlimit(RLIMIT_STACK, &lim) == 0) {
    printf("Soft Limit = %ld\n", lim.rlim_cur);
    printf("Max Stack Size = %ld\n", lim.rlim_max);
  } else {
    printf("%s\n", strerror(errno));
  }
  return 0;
}

Soft Limit = 8388608
Max Stack Size = -1

Let’s now see the stack memory layout and what the stack frame for a function contains. Stack Memory Layout

A Stack frame contains four types of information:

Parameters passed to the function (Reverse Order)
The return address of the caller function.
The base pointer of the caller function
Local variables of the function

The size of the return address and base pointer is 4 bytes for 32-bit architecture and 8 bytes for 64-bit architecture.

stack layout example program
#include <stdio.h>

int sum(int a, int b) {
    return a + b;
}

float avg(int a, int b) {
    int s = sum(a, b);
    return (float)s / 2;
}

int main() {
    int a = 10;
    int b = 20;
    printf("Average of %d, %d = %f\n", a, b, avg(a, b));
    return 0;
}

StackLayout

The frame that is being executed is always the topmost frame of the stack. The pointer to the top-most frame in the stack is called the Frame Pointer or Base Pointer. The Base Pointer stores the starting address in callee’s stack frame where the caller’s base pointer value is copied.

The pointer to the top of the stack is called the Stack Pointer. Stack Pointer stores the address of the top of the stack memory.

The stack memory has automatic memory management for both allocation and de-allocation. The programmer has no control over the memory of the stack. When constructing a stack-frame, the local variable of the function is allocated and de-allocated when the stack-frame is about to pop up from the stack segment. This also defines the scope of a variable. Stack Error Conditions

Let’s take a look at what errors we can face when dealing with the stack.

Stack Overflow

This is an error when a program has a long sequence of function calls, and the program stack expands past the full fixed size, resulting in a stack overflow.

What causes stack overflow condition:

Recursive function calls
Declaration of large arrays

Stack Memory has a limited size and thus it is not recommended to store large objects.

Stack Corruption

Stack corruption is a condition in which we corrupt the stack data by copying more data than the actual memory capacity.

Example:

stack corruption
#include <stdio.h>
#include <string.h>

void copy(char *argv) {
  char name[10];
  strcpy(name, argv);
}

int main(int argc, char **argv) {
  copy(argv[1]);
  printf("Exit\n");
  return 0;
}

There is a copy function in the above code where a name array of 10 bytes of the char data type has been specified. And we’re copying data from the argument on the command line. If the user passes a string with a size larger than 10 bytes, the stack frame will overwrite another block and this will lead to stack corruption.

Heap

As we’ve seen, the stack has a limited size that doesn’t allow us to work with big data, and we don’t have control over it. This problem is solved by the Heap memory, a continuous part of virtual address space where the allocation and de-allocation of memory can be performed in real-time.

Unlike stack memory there is no such automatic memory management and the allocation and de-allocation of heap memory is the primary responsibility of the programmer.

To harness the heap memory, we need the Glibc API, which provides the functions to allocate and de-allocate the heap memory.

The malloc()/calloc() function is used to assign a memory block from the heap segment and the free() function is used to restore the memory to the heap segment that was assigned by the malloc()/calloc() function.

Under the hood, the malloc() and calloc() functions use the brk() and sbrk() system calls to allocate and de-allocate the heap memory for a process.

brk/sbrk

These functions malloc, calloc, realloc, and free are defined in the header file, stdlib.h.

One factor to keep in mind is that we can only use pointers to address a heap memory block.

Now let’s see an example of how the heap memory is allocated and de-allocated.

malloc

allocates a memory block of given size (in bytes) and returns a pointer to the beginning of the block. malloc() doesn’t initialize the allocated memory. If you try to read from the allocated memory without first initializing it, then you will invoke undefined behavior, which will usually mean the values you read will be garbage.

calloc

allocates the memory and also initializes every byte in the allocated memory to 0. If you try to read the value of the allocated memory without initializing it, you’ll get 0 as it has already been initialized to 0 by calloc().

#include <stdio.h>
#include <stdlib.h>



void func() {
  int a = 10;
  int *aptr = &a;
  int *ptr = (int *)malloc(sizeof(int));
  *ptr = 20;
  printf("Heap Memory Value = %d\n", *ptr);
  printf("Pointing in Stack = %d\n", *aptr);
  free(ptr);
}

int main() {
  func();
  return 0;
}

heap-mem

The image above is a simple description of how a heap of memory is accessed using a malloc() function call. The picture indicates that the value of integer 20 is stored in the 4 Byte of heap area allocated by the malloc() function, but that is not really true. The value is actually stored in the physical memory, i.e. the RAM, the virtual address of the heap segment is converted to the physical address using the MMU (Memory Management Unit), and the value is written or accessed.

The heap memory block has no scope, so the programmer has to manually free the reserved space from the heap.

Memory layout

Text/Code Segment​

Initialized Data Segment​

Uninitialized Data Segment (BSS)​

Dynamic Memory Layout​

Stack​

Stack Overflow​

Stack Corruption​

Heap​

malloc​

calloc​