I have some questions about the memory layout of a program in Linux. I know from various sources (I’m reading “Programming from the Ground Up”) that each section is loaded into it’s own region of memory. The text section loads first at virtual address 0x8048000, the data section is loaded immediately after that, next is the bss section, followed by the heap and the stack.
To experiment with the layout I made this program in assembly. First it prints the addresses of some labels and calculates the system break point. Then it enters into an infinite loop. The loop increments a pointer and then it tries to access the memory at that address, at some point a segmentation fault will exit the program (I did this intentionally).
This is the program:
.section .data start_data: str_mem_access: .ascii "Accessing address: 0x%xn" str_data_start: .ascii "Data section start at: 0x%xn" str_data_end: .ascii "Data section ends at: 0x%xn" str_bss_start: .ascii "bss section starts at: 0x%xn" str_bss_end: .ascii "bss section ends at: 0x%xn" str_text_start: .ascii "text section starts at: 0x%xn" str_text_end: .ascii "text section ends at: 0x%xn" str_break: .ascii "break at: 0x%xn" end_data: .section .bss start_bss: .lcomm buffer, 500 .lcomm buffer2, 250 end_bss: .section .text start_text: .globl _start _start: # print address of start_text label pushl $start_text pushl $str_text_start call printf addl $8, %esp # print address of end_text label pushl $end_text pushl $str_text_end call printf addl $8, %esp # print address of start_data label pushl $start_data pushl $str_data_start call printf addl $8, %esp # print address of end_data label pushl $end_data pushl $str_data_end call printf addl $8, %esp # print address of start_bss label pushl $start_bss pushl $str_bss_start call printf addl $8, %esp # print address of end_bss label pushl $end_bss pushl $str_bss_end call printf addl $8, %esp # get last usable virtual memory address movl $45, %eax movl $0, %ebx int $0x80 incl %eax # system break address # print system break pushl %eax pushl $str_break call printf addl $4, %esp movl $start_text, %ebx loop: # print address pushl %ebx pushl $str_mem_access call printf addl $8, %esp # access address # segmentation fault here movb (%ebx), %dl incl %ebx jmp loop end_loop: movl $1, %eax movl $0, %ebx int $0x80 end_text:
And this the relevant parts of the output (this is Debian 32bit):
text section starts at: 0x8048190 text section ends at: 0x804823b Data section start at: 0x80492ec Data section ends at: 0x80493c0 bss section starts at: 0x80493c0 bss section ends at: 0x80493c0 break at: 0x83b4001 Accessing address: 0x8048190 Accessing address: 0x8048191 Accessing address: 0x8048192 [...] Accessing address: 0x8049fff Accessing address: 0x804a000 ViolaciĆ³n de segmento
My questions are:
1) Why is my program starting at address 0x8048190 instead of 0x8048000? With this I guess that the instruction at the “_start” label is not the first thing to load, so what’s between the addresses 0x8048000 and 0x8048190?
2) Why is there a gap between the end of the text section and the start of the data section?
3) The bss start and end addresses are the same. I assume that the two buffers are stored somewhere else, is this correct?
4) If the system break point is at 0x83b4001, why I get the segmentation fault earlier at 0x804a000?
Advertisement
Answer
I’m assuming you’re building this with gcc -m32 -nostartfiles segment-bounds.S
or similar, so you have a 32-bit dynamic binary. (You don’t need -m32
if you’re actually using a 32-bit system, but most people that want to test this will have 64-bit systems.)
My 64-bit Ubuntu 15.10 system gives slightly different numbers from your program for a few things, but the overall pattern of behaviour is the same. (Different kernel, or just ASLR, explains this. The brk address varies wildly, for example, with values like 0x9354001
or 0x82a8001
)
1) Why is my program starting at address 0x8048190 instead of 0x8048000?
If you build a static binary, your _start
will be at 0x8048000.
We can see from readelf -a a.out
that 0x8048190
is the start of the .text section. But it isn’t at the start of the text segment that’s mapped to a page. (pages are 4096B, and Linux requires mappings to be aligned on 4096B boundaries of file position, so with the file laid out this way, it wouldn’t be possible for execve
to map _start
to the start of a page. I think the Off column is position within the file.)
Presumably the other sections in the text segment before the .text
section are read-only data that’s needed by the dynamic linker, so it makes sense to have it mapped into memory in the same page.
## part of readelf -a output Section Headers: [Nr] Name Type Addr Off Size ES Flg Lk Inf Al [ 0] NULL 00000000 000000 000000 00 0 0 0 [ 1] .interp PROGBITS 08048114 000114 000013 00 A 0 0 1 [ 2] .note.gnu.build-i NOTE 08048128 000128 000024 00 A 0 0 4 [ 3] .gnu.hash GNU_HASH 0804814c 00014c 000018 04 A 4 0 4 [ 4] .dynsym DYNSYM 08048164 000164 000020 10 A 5 1 4 [ 5] .dynstr STRTAB 08048184 000184 00001c 00 A 0 0 1 [ 6] .gnu.version VERSYM 080481a0 0001a0 000004 02 A 4 0 2 [ 7] .gnu.version_r VERNEED 080481a4 0001a4 000020 00 A 5 1 4 [ 8] .rel.plt REL 080481c4 0001c4 000008 08 AI 4 9 4 [ 9] .plt PROGBITS 080481d0 0001d0 000020 04 AX 0 0 16 [10] .text PROGBITS 080481f0 0001f0 0000ad 00 AX 0 0 1 ########## The .text section [11] .eh_frame PROGBITS 080482a0 0002a0 000000 00 A 0 0 4 [12] .dynamic DYNAMIC 08049f60 000f60 0000a0 08 WA 5 0 4 [13] .got.plt PROGBITS 0804a000 001000 000010 04 WA 0 0 4 [14] .data PROGBITS 0804a010 001010 0000d4 00 WA 0 0 1 [15] .bss NOBITS 0804a0e8 0010e4 0002f4 00 WA 0 0 8 [16] .shstrtab STRTAB 00000000 0010e4 0000a2 00 0 0 1 [17] .symtab SYMTAB 00000000 001188 0002b0 10 18 38 4 [18] .strtab STRTAB 00000000 001438 000123 00 0 0 1 Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings) I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown) O (extra OS processing required) o (OS specific), p (processor specific)
2) Why is there a gap between the end of the text section and the start of the data section?
Why not? They have to be in different segments of the executable, so mapped to different pages. (Text is read-only and executable, and can be MAP_SHARED. Data is read-write and has to be MAP_PRIVATE. BTW, in Linux the default is for data to also be executable.)
Leaving a gap makes room for the dynamic linker to map the text segment of shared libraries next to the text of the executable. It also means an out-of-bounds array index into the data section is more likely to segfault. (Earlier and noisier failure is always easier to debug).
3) The bss start and end addresses are the same. I assume that the two buffers are stored somewhere else, is this correct?
That’s interesting. They’re in the bss, but IDK why the current position isn’t affected by .lcomm
labels. Probably they go in a different subsection before linking, since you used .lcomm
instead of .comm
. If I use use .skip
or .zero
to reserve space, I get the results you expected:
.section .bss start_bss: #.lcomm buffer, 500 #.lcomm buffer2, 250 buffer: .skip 500 buffer2: .skip 250 end_bss:
.lcomm
puts things in the BSS even if you don’t switch to that section. i.e. it doesn’t care what the current section is, and maybe doesn’t care about or affect what the current position in the .bss
section is. TL:DR: when you switch to the .bss
manually, use .zero
or .skip
, not .comm
or .lcomm
.
4) If the system break point is at 0x83b4001, why I get the segmentation fault earlier at 0x804a000?
That tells us that there are unmapped pages between the text segment and the brk. (Your loop starts with ebx = $start_text
, so it faults at the on the first unmapped page after the text segment). Besides the hole in virtual address space between text and data, there’s probably also other holes beyond the data segment.
Memory protection has page granularity (4096B), so the first address to fault will always be the first byte of a page.