About the memory layout of programs in Linux

Question

I have some questions about the memory layout of a program in Linux. I know from various sources (I'm reading "Programming from the Ground Up") that each section is loaded into it's own region of memory. The text section loads first at virtual address 0x8048000, the data section is loaded immediately after that, next is the bss section, followed by

Accepted Answer

I&#8217;m assuming you&#8217;re building this with gcc -m32 -nostartfiles segment-bounds.S or similar, so you have a 32-bit dynamic binary.  (You don&#8217;t need -m32 if you&#8217;re actually using a 32-bit system, but most people that want to test this will have 64-bit systems.)My 64-bit Ubuntu 15.10 system gives slightly different numbers from your program for a few things, but the overall pattern of behaviour is the same.  (Different kernel, or just ASLR, explains this.  The brk address varies wildly, for example, with values like 0x9354001 or 0x82a8001)  1) Why is my program starting at address 0x8048190 instead of 0x8048000? If you build a static binary, your _start will be at 0x8048000.We can see from readelf -a a.out that 0x8048190 is the start of the .text section.  But it isn&#8217;t at the start of the text segment that&#8217;s mapped to a page.  (pages are 4096B, and Linux requires mappings to be aligned on 4096B boundaries of file position, so with the file laid out this way, it wouldn&#8217;t be possible for execve to map _start to the start of a page.  I think the Off column is position within the file.)Presumably the other sections in the text segment before the .text section are read-only data that&#8217;s needed by the dynamic linker, so it makes sense to have it mapped into memory in the same page.## part of readelf -a outputSection Headers:  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al  [ 0]                   NULL            00000000 000000 000000 00      0   0  0  [ 1] .interp           PROGBITS        08048114 000114 000013 00   A  0   0  1  [ 2] .note.gnu.build-i NOTE            08048128 000128 000024 00   A  0   0  4  [ 3] .gnu.hash         GNU_HASH        0804814c 00014c 000018 04   A  4   0  4  [ 4] .dynsym           DYNSYM          08048164 000164 000020 10   A  5   1  4  [ 5] .dynstr           STRTAB          08048184 000184 00001c 00   A  0   0  1  [ 6] .gnu.version      VERSYM          080481a0 0001a0 000004 02   A  4   0  2  [ 7] .gnu.version_r    VERNEED         080481a4 0001a4 000020 00   A  5   1  4  [ 8] .rel.plt          REL             080481c4 0001c4 000008 08  AI  4   9  4  [ 9] .plt              PROGBITS        080481d0 0001d0 000020 04  AX  0   0 16  [10] .text             PROGBITS        080481f0 0001f0 0000ad 00  AX  0   0  1         ########## The .text section  [11] .eh_frame         PROGBITS        080482a0 0002a0 000000 00   A  0   0  4  [12] .dynamic          DYNAMIC         08049f60 000f60 0000a0 08  WA  5   0  4  [13] .got.plt          PROGBITS        0804a000 001000 000010 04  WA  0   0  4  [14] .data             PROGBITS        0804a010 001010 0000d4 00  WA  0   0  1  [15] .bss              NOBITS          0804a0e8 0010e4 0002f4 00  WA  0   0  8  [16] .shstrtab         STRTAB          00000000 0010e4 0000a2 00      0   0  1  [17] .symtab           SYMTAB          00000000 001188 0002b0 10     18  38  4  [18] .strtab           STRTAB          00000000 001438 000123 00      0   0  1Key to Flags:  W (write), A (alloc), X (execute), M (merge), S (strings)  I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)  O (extra OS processing required) o (OS specific), p (processor specific)  2) Why is there a gap between the end of the text section and the start of the data section?Why not?  They have to be in different segments of the executable, so mapped to different pages.  (Text is read-only and executable, and can be MAP_SHARED.  Data is read-write and has to be MAP_PRIVATE.  BTW, in Linux the default is for data to also be executable.)Leaving a gap makes room for the dynamic linker to map the text segment of shared libraries next to the text of the executable.  It also means an out-of-bounds array index into the data section is more likely to segfault.  (Earlier and noisier failure is always easier to debug).3) The bss start and end addresses are the same. I assume that the two buffers are stored somewhere else, is this correct?That&#8217;s interesting.  They&#8217;re in the bss, but IDK why the current position isn&#8217;t affected by .lcomm labels.  Probably they go in a different subsection before linking, since you used .lcomm instead of .comm.  If I use use .skip or .zero to reserve space, I get the results you expected:.section .bssstart_bss:#.lcomm buffer, 500#.lcomm buffer2, 250buffer:  .skip 500buffer2: .skip 250end_bss:.lcomm puts things in the BSS even if you don&#8217;t switch to that section.  i.e. it doesn&#8217;t care what the current section is, and maybe doesn&#8217;t care about or affect what the current position in the .bss section is.  TL:DR: when you switch to the .bss manually, use .zero or .skip, not .comm or .lcomm.  4) If the system break point is at 0x83b4001, why I get the segmentation fault earlier at 0x804a000?That tells us that there are unmapped pages between the text segment and the brk.  (Your loop starts with ebx = $start_text, so it faults at the on the first unmapped page after the text segment).  Besides the hole in virtual address space between text and data, there&#8217;s probably also other holes beyond the data segment.Memory protection has page granularity (4096B), so the first address to fault will always be the first byte of a page.

Advertisement

Answer