If I compile this program:
#include <stdio.h> int main(int argc, char** argv) { printf("hello world!n"); return 0; }
for x86-64, the asm output uses movl $.LC0, %edi
/ call puts
. (See full asm output / compile options on godbolt.)
My question is: How can GCC know that the the string’s address can fit in a 32bit immediate operand? Why doesn’t it need to use movabs $.LC0, %rdi
(i.e. a mov r64, imm64
, not a zero or sign-extended imm32
).
AFAIK, there’s nothing saying the loader has to decide to load the data section at any particular address. If the string is stored at some address above 1ULL << 32
then the higher bits will be ignored by the movl. I get similar behavior with clang, so I don’t think this is unique to GCC.
The reason I care is I want to create my own data segment that lives in memory at any arbitrary address I choose (above 2^32 potentially).
Advertisement
Answer
In GCC manual:
https://gcc.gnu.org/onlinedocs/gcc-4.5.3/gcc/i386-and-x86_002d64-Options.html
3.17.15 Intel 386 and AMD x86-64 Options
-mcmodel=small
Generate code for the small code model: the program and its symbols must be linked in the lower 2 GB of the address space. Pointers are 64 bits. Programs can be statically or dynamically linked. This is the default code model.
-mcmodel=kernel Generate code for the kernel code model. The kernel runs in the negative 2 GB of the address space. This model has to be used for Linux kernel code.
-mcmodel=medium
Generate code for the medium model: The program is linked in the lower 2 GB of the address space. Small symbols are also placed there. Symbols with sizes larger than -mlarge-data-threshold are put into large data or bss sections and can be located above 2GB. Programs can be statically or dynamically linked.
-mcmodel=large
Generate code for the large model: This model makes no assumptions about addresses and sizes of sections.
https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html
3.18.1 AArch64 Options
-mcmodel=tiny
Generate code for the tiny code model. The program and its statically defined symbols must be within 1GB of each other. Pointers are 64 bits. Programs can be statically or dynamically linked. This model is not fully implemented and mostly treated as ‘small’.
-mcmodel=small
Generate code for the small code model. The program and its statically defined symbols must be within 4GB of each other. Pointers are 64 bits. Programs can be statically or dynamically linked. This is the default code model.
-mcmodel=large
Generate code for the large code model. This makes no assumptions about addresses and sizes of sections. Pointers are 64 bits. Programs can be statically linked only.