I’m trying to understand the operation of linker and loader, and memory addresses(physical or virtual) regarding how a program is actually compiled and executed. I encountered two pieces of information and formed my own version of comprehension.
1st information:
W.5.1 SHARED OBJECTS In a typical system, a number of programs will be running. Each program relies on a number of functions, some of which will be standard C library functions, like printf(), malloc(), strcpy(), etc. and some are non-standard or user defined functions. If every program uses the standard C library, it means that each program would normally have a unique copy of this particular library present within it. Unfortunately, this result in wasted resources, degrade the efficiency and performance. **Since the C library is common, it is better to have each program reference the common, one instance of that library, instead of having each program contain a copy of the library. This is implemented during the linking process where some of the objects are linked during the link time whereas some done during the run time (deferred/dynamic linking). **
2nd information:
C Library
Main Articles: See C Library, Creating a C Library One thing up front: When you begin working on your kernel, you do not have a C library available. You have to provide everything yourself, except a few pieces provided by the compiler itself. You will also have to port an existing C library or write one yourself. The C library implements the standard C functions (i.e., the things declared in , , etc.) and provides them in binary form suitable for linking with user-space applications. In addition to standard C functions (as defined in the ISO standard), a C library might (and usually does) implement further functionality, which might or might not be defined by some standard. The standard C library says nothing about networking, for example. For Unix-like systems, the POSIX standard defines what is expected from a C library; other systems might differ fundamentally. It should be noted that, in order to implement its functionality, the C library must call kernel functions. So, for your own OS, you can of course take a ready-made C library and just recompile it for your OS – but that requires that you tell the library how to call your kernel functions, and your kernel to actually provide those functions. A more elaborate example is available in Library Calls or, you can use an existing C Library or create your own C Library.
The way I understood:
when a computer boots, it first doesn’t have any access to C library and instead it must work with machine code. But with the help of boot code, it will eventually start loading the OS. In this example, I will assume a computer loading linux OS. Naturally a linux kernel will be loaded.
when a linux kernel is booted, this also means that standard C library(basic functions like printf for example) is also loaded on to low memory(portion of RAM assigned for kernel space). Assume that a user has made a simple code using printf() from standard C library. The user will compile this code and during this process, the linker will make a ‘reference’ for printf(), implying the position where printf() function is residing in low memory. When this code is executed, the loader will load this executable saved in HDD to high memory(portion of RAM assigned for user space). When the process confronts printf() function, it will branch to low memory address containing the start of printf() function.
Am i correct? If not, where am I wrong?
Advertisement
Answer
You are wrong.
1.) There is no need to put libc into kernel. It doesn’t affect any low-level system or hardware dependent components.
2.) libc.so is ordinary dynamic library.
Now some more details:
When you launch your application, f.e. from bash console, bash forks and execs new process. What does it mean. Actually, this means that OS creates address space environment and loads .text .data .bss from ELF file, preserves virtual space for stack. You can see this mappings here:
sudo cat /proc/1118/maps 00400000-00407000 r-xp 00000000 08:01 1845158 /sbin/getty 00606000-00607000 r--p 00006000 08:01 1845158 /sbin/getty 00607000-00608000 rw-p 00007000 08:01 1845158 /sbin/getty 00608000-0060a000 rw-p 00000000 00:00 0 00ff3000-01014000 rw-p 00000000 00:00 0 [heap] ... 7f728efd3000-7f728efd5000 rw-p 001bf000 08:01 466797 /lib/x86_64-linux-gnu/libc-2.19.so 7f728efd5000-7f728efda000 rw-p 00000000 00:00 0 7f728efda000-7f728effd000 r-xp 00000000 08:01 466799 /lib/x86_64-linux-gnu/ld-2.19.so 7f728f1fe000-7f728f1ff000 rw-p 00000000 00:00 0 7fffa122b000-7fffa124c000 rw-p 00000000 00:00 0 [stack] 7fffa1293000-7fffa1295000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
But there are more. After loading thoose segments, Linux kernel will also load ld-linux.so into memory (you can see it in mappings). This stuff called dynamic linker, and actually ld-linux is responsible for all dynamic libraries loading. As you might know, at the moment the application have been compiled, you already know the list of shared libraries you will use. You can check it via ldd command
ldd /sbin/getty linux-vdso.so.1 => (0x00007fff4cfa6000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f2af2832000) /lib64/ld-linux-x86-64.so.2 (0x00007f2af2c24000)
This stuff must be held somewhere in the ELF (don’t know where exactly). So after loading, ld-linux uses this list and finds all needed libraries at predefined (standart) paths like /usr/lib and so on. Now ld-linux can just mmap regions for located dynamic libraries. That is how libc will be loaded to process address space.