If i understand correctly when user tries to execute dynamically linked
executable (with execve("foo", "", "")
) instead of loading text segment of “foo” dynamic linker is loaded (ld-linux.so.2) and executed. It have to load
libraries required for program (“foo”) to run and change some addresses
in “foo” and pass control to foo, but how is this acomplished?
How (what system call it uses) and where does dynamic loader load libraries and “foo”s code and data in memory (I am guessing it can’t simply use malloc or mmap and then jump to code since that should be impossible, right? It also seems unlikely that it creates temp file whith complete executable (like staticlly linked one) and calls exceve again.).
Advertisement
Answer
The actual implementation is quite complex as it builds on top of ELF, which is quite complex as it tries to accommodate many scenarios, but conceptually it’s quite simple.
Basically (after the library dependencies are located and open
ed) it’s a couple of mmaps, mprotects, some modifications to implement the linking by binding symbols (can be deferred), and then jump to code.
Ideally, the linked shared libraries will be compiled with -fpic
/-fPIC
which will allow the linker to place them anywhere in the processes address space without having to write to the .text
section (=executable code) of the library.
Such a library/executable will call functions from other libraries via a modifiable table, which the linker will fix up (probably lazily) to point to the actual locations where it has loaded the dependent library.
Access to variables from one shared library to another is similarly indirected.
Limiting the modifying library data/code as much as possible allows marking sections of code to be marked read only (via the MMU / the mprotect
system call) and mapped into memory that’s shared among all processes that use that particular library.
To get an idea of what happens at the syscall level, you can try e.g.:
strace /bin/echo hello world
and all the syscalls up to about sbrk
included (=setting up the heap / .data
segment) should be the doings of the dynamic linker.
(malloc
is indeed unavailable to the linker as malloc
is a feature of the c library, not the system. malloc
is about growing and managing the heap section and potentially mmap
ping other separate blocks and managing those as well as the writable “heap”, and the dynamic linker isn’t concerned about these sections of a process image, mainly just its writable indirection tables and where it maps libraries).