Assembly included. This weekend I tried to get my own small library running without any C libs and the thread local stuff is giving me problems. Below you can see I created a struct called Try1
(because it’s my first attempt!) If I set the thread local variable and use it, the code seems to execute fine. If I call a const method on Try1 with a global variable it seems to run fine. Now if I do both, it’s not fine. It segfaults despite me being able to access members and running the function with a global variable. The code will print Hello and Hello2 but not Hello3
I suspect the problem is the address of the variable. I tried using an if statement to print the first hello. if ((s64)&t1 > (s64)buf+1024*16)
It was true so it means the pointer isn’t where I thought it was. Also it isn’t -8 as gdb suggest (it’s a signed compare and I tried 0 instead of buf)
Assembly under the c++ code. First line is the first call to write
//test.cpp //clang++ or g++ -std=c++20 -g -fno-rtti -fno-exceptions -fno-stack-protector -fno-asynchronous-unwind-tables -static -nostdlib test.cpp -march=native && ./a.out #include <immintrin.h> typedef unsigned long long int u64; ssize_t my_write(int fd, const void *buf, size_t size) { register int64_t rax __asm__ ("rax") = 1; register int rdi __asm__ ("rdi") = fd; register const void *rsi __asm__ ("rsi") = buf; register size_t rdx __asm__ ("rdx") = size; __asm__ __volatile__ ( "syscall" : "+r" (rax) : "r" (rdi), "r" (rsi), "r" (rdx) : "cc", "rcx", "r11", "memory" ); return rax; } void my_exit(int exit_status) { register int64_t rax __asm__ ("rax") = 60; register int rdi __asm__ ("rdi") = exit_status; __asm__ __volatile__ ( "syscall" : "+r" (rax) : "r" (rdi) : "cc", "rcx", "r11", "memory" ); } struct Try1 { u64 val; constexpr Try1() { val=0; } u64 Get() const { return val; } }; static char buf[1024*8]; //originally mmap but lets reduce code static __thread u64 sanity_check; static __thread Try1 t1; static Try1 global; extern "C" int _start() { auto tls_size = 4096*2; auto originalFS = _readfsbase_u64(); _writefsbase_u64((u64)(buf+4096)); global.val = 1; global.Get(); //Executes fine sanity_check=6; t1.val = 7; my_write(1, "Hellon", sanity_check); my_write(1, "Hello2n", t1.val); //Still fine my_write(1, "Hello3n", t1.Get()); //crash! :/ my_exit(0); return 0; }
Asm:
4010b4: e8 47 ff ff ff call 401000 <_Z8my_writeiPKvm> 4010b9: 64 48 8b 04 25 f8 ff mov rax,QWORD PTR fs:0xfffffffffffffff8 4010c0: ff ff 4010c2: 48 89 c2 mov rdx,rax 4010c5: 48 8d 05 3b 0f 00 00 lea rax,[rip+0xf3b] # 402007 <_ZNK4Try13GetEv+0xeef> 4010cc: 48 89 c6 mov rsi,rax 4010cf: bf 01 00 00 00 mov edi,0x1 4010d4: e8 27 ff ff ff call 401000 <_Z8my_writeiPKvm> 4010d9: 64 48 8b 04 25 00 00 mov rax,QWORD PTR fs:0x0 4010e0: 00 00 4010e2: 48 05 f8 ff ff ff add rax,0xfffffffffffffff8 4010e8: 48 89 c7 mov rdi,rax 4010eb: e8 28 00 00 00 call 401118 <_ZNK4Try13GetEv> 4010f0: 48 89 c2 mov rdx,rax 4010f3: 48 8d 05 15 0f 00 00 lea rax,[rip+0xf15] # 40200f <_ZNK4Try13GetEv+0xef7> 4010fa: 48 89 c6 mov rsi,rax 4010fd: bf 01 00 00 00 mov edi,0x1 401102: e8 f9 fe ff ff call 401000 <_Z8my_writeiPKvm> 401107: bf 00 00 00 00 mov edi,0x0 40110c: e8 12 ff ff ff call 401023 <_Z7my_exiti> 401111: b8 00 00 00 00 mov eax,0x0 401116: c9 leave 401117: c3 ret
Advertisement
Answer
The ABI requires that fs:0
contains a pointer with the absolute address of the thread-local storage block, i.e. the value of fsbase
. The compiler needs access to this address to evaluate expressions like &t1
, which here it needs in order to compute the this
pointer to be passed to Try1::Get()
.
It’s tricky to recover this address on x86-64, since the TLS base address isn’t in a convenient general register, but in the hidden fsbase
. It isn’t feasible to execute rdfsbase
every time we need it (expensive instruction that may not be available) nor worse yet to call arch_prctl
, so the easiest solution is to ensure that it’s available in memory at a known address. See this past answer and sections 3.4.2 and 3.4.6 of “ELF Handling for Thread-Local Storage”, which is incorporated by reference into the x86-64 ABI.
In your disassembly at 0x4010d9
, you can see the compiler trying to load from address fs:0x0
into rax
, then adding -8 (the offset of t1
in the TLS block) and moving the result into rdi
as the hidden this
argument to Try1::Get()
. Obviously since you have zeros at fs:0
instead, the resulting pointer is invalid and you get a crash when Try1::Get()
reads val
, which is really this->val
.
I would write something like
void *fsbase = buf+4096; _writefsbase_u64((u64)fsbase); *(void **)fsbase = fsbase;
(Or memcpy(fsbase, &fsbase, sizeof(void *))
might be more compliant with strict aliasing.)