Skip to content
Advertisement

About the memory layout of programs in Linux

I have some questions about the memory layout of a program in Linux. I know from various sources (I’m reading “Programming from the Ground Up”) that each section is loaded into it’s own region of memory. The text section loads first at virtual address 0x8048000, the data section is loaded immediately after that, next is the bss section, followed by the heap and the stack.

To experiment with the layout I made this program in assembly. First it prints the addresses of some labels and calculates the system break point. Then it enters into an infinite loop. The loop increments a pointer and then it tries to access the memory at that address, at some point a segmentation fault will exit the program (I did this intentionally).

This is the program:

JavaScript

And this the relevant parts of the output (this is Debian 32bit):

JavaScript

My questions are:

1) Why is my program starting at address 0x8048190 instead of 0x8048000? With this I guess that the instruction at the “_start” label is not the first thing to load, so what’s between the addresses 0x8048000 and 0x8048190?

2) Why is there a gap between the end of the text section and the start of the data section?

3) The bss start and end addresses are the same. I assume that the two buffers are stored somewhere else, is this correct?

4) If the system break point is at 0x83b4001, why I get the segmentation fault earlier at 0x804a000?

Advertisement

Answer

I’m assuming you’re building this with gcc -m32 -nostartfiles segment-bounds.S or similar, so you have a 32-bit dynamic binary. (You don’t need -m32 if you’re actually using a 32-bit system, but most people that want to test this will have 64-bit systems.)

My 64-bit Ubuntu 15.10 system gives slightly different numbers from your program for a few things, but the overall pattern of behaviour is the same. (Different kernel, or just ASLR, explains this. The brk address varies wildly, for example, with values like 0x9354001 or 0x82a8001)


1) Why is my program starting at address 0x8048190 instead of 0x8048000?

If you build a static binary, your _start will be at 0x8048000.

We can see from readelf -a a.out that 0x8048190 is the start of the .text section. But it isn’t at the start of the text segment that’s mapped to a page. (pages are 4096B, and Linux requires mappings to be aligned on 4096B boundaries of file position, so with the file laid out this way, it wouldn’t be possible for execve to map _start to the start of a page. I think the Off column is position within the file.)

Presumably the other sections in the text segment before the .text section are read-only data that’s needed by the dynamic linker, so it makes sense to have it mapped into memory in the same page.

JavaScript

2) Why is there a gap between the end of the text section and the start of the data section?

Why not? They have to be in different segments of the executable, so mapped to different pages. (Text is read-only and executable, and can be MAP_SHARED. Data is read-write and has to be MAP_PRIVATE. BTW, in Linux the default is for data to also be executable.)

Leaving a gap makes room for the dynamic linker to map the text segment of shared libraries next to the text of the executable. It also means an out-of-bounds array index into the data section is more likely to segfault. (Earlier and noisier failure is always easier to debug).


3) The bss start and end addresses are the same. I assume that the two buffers are stored somewhere else, is this correct?

That’s interesting. They’re in the bss, but IDK why the current position isn’t affected by .lcomm labels. Probably they go in a different subsection before linking, since you used .lcomm instead of .comm. If I use use .skip or .zero to reserve space, I get the results you expected:

JavaScript

.lcomm puts things in the BSS even if you don’t switch to that section. i.e. it doesn’t care what the current section is, and maybe doesn’t care about or affect what the current position in the .bss section is. TL:DR: when you switch to the .bss manually, use .zero or .skip, not .comm or .lcomm.


4) If the system break point is at 0x83b4001, why I get the segmentation fault earlier at 0x804a000?

That tells us that there are unmapped pages between the text segment and the brk. (Your loop starts with ebx = $start_text, so it faults at the on the first unmapped page after the text segment). Besides the hole in virtual address space between text and data, there’s probably also other holes beyond the data segment.

Memory protection has page granularity (4096B), so the first address to fault will always be the first byte of a page.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement