Skip to content
Advertisement

pthread_detach() causes SIGSEGV on 64 bit Linux

Here is a description of my situation: I have to take care of the bug in our product. The thread is created as joinable , it must do its work, terminate and nobody will call pthread_join() for it. So the thread is created with JOINABLE attribute (by default) and before termination it calls the next code:

{  pthread_detach(pthread_self()); pthread_exit(NULL); }

It works like a charm on all 32 bit linux distros I met, but it causes SIGSEGV on 64 bit distros (Ubuntu 13.04 x86_64 and Debian). I didn’t try with Slackware. Here is a core:

Core was generated by `IsaVM -s=1 -PrjPath="/home/taf/Linux_Fov_540148/Cmds"  -stgMode=1 -PR -Failover'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f5911a7c009 in pthread_detach () from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) bt
#0  0x00007f5911a7c009 in pthread_detach () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x000000000041310d in _kerCltDownloadThr (StartParams=0x6bfce0 <RESFOV>) at ./dker0clt.c:1258
#2  0x00007f5911a7ae9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#3  0x00007f591159f3fd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x0000000000000000 in ?? ()

I figured out how to fix this bug – I set CREATE_DETACHABLE attribute (with pthread_attr_setdetachstate()) for the thread before it is created and it works as expected.

But my question – is it a crime to call this code?

{  pthread_detach(pthread_self()); pthread_exit(NULL); }

Does pthread_detach() do something asynchronously after call and that causes pthread_exit() to bring problems? But the crash point is pthread_detach() not pthread_exit()! I don’t understand the reason for this crash completely! Why does it work on 32 bits? Is it a race condition somewhere in the pthread implementation?

pthread_join() doesn’t called for this thread.

Thanks in advance for any ideas.

Advertisement

Answer

I finished my research with approaches offered by a respectable @MaximYegorushkin. AddressSanitizer shows me one buffer obverflow in our product but it isn’t related to my problem (I will definitely fix it later, it is always good to have such a wise tool to hunt the bugs). So decided to override all necessary pthread_xxx functions with LD_PRELOAD method. I run a simple test to be sure my library works as expected:

[HACK] Loading pthread hack.
Starting thread...!
[HACK] pthread_create: thread=7FAC6C86D700
Waiting for 2 seconds...
[HACK] pthread_self: thread=7FAC6C86D700
thread_func: thread id = 7FAC6C86D700
Thread: sin(3.26) = -0.121109
[HACK] pthread_self: thread=7FAC6C86D700
[HACK] pthread_detach: thread=7FAC6C86D700
Terminating...

All strings started from [HACK] are produced by my threadhack.so library. Then I run my project with this library it points me exactly where the problem is:

Code executed: { pthread_detach(pthread_self()); pthread_exit(NULL); }

Debug traces:

[HACK] pthread_create: thread=7F403251CB00
.....
[HACK] pthread_self: thread=7F403251CB00  
[HACK] pthread_detach: thread=3251CB00    

So we see that pthread_self returns a good thread id, but pthread_detach received it already mangled (cut to 32 bit). How could this be? I generated assembler code for both my simple working test application as a reference and for my project:

Reference application:

call    pthread_self
movq    %rax, %rdi
call    pthread_detach
movl    $0, %edi
call    pthread_exit

So we see here that movq instruction is used to copy 64 bit thread id (movq %rax, %rdi). OK, check what GCC generated for my project:

movl    $0, %eax
call    pthread_self
movl    %eax, %edi
movl    $0, %eax
call    pthread_detach
movl    $0, %edi
movl    $0, %eax
call    pthread_exit

Woa! We have two movl instructions (32 bit), one copies the least significant 32 bits (movl %eax, %edi) and instead of most significan part it always put zero! (movl $0, %eax). So this is a reason for the mangled thead id. I have no idea why the code is so different – compilation flags are the same. I saw this bug in GCC 4.7 I see this bug in GCC 4.8 (Latest package from the Ubuntu 13.10 x86_64).

So at least now I see what hapenning. Thanks to @Maxim and brilliant tools. I learned a new thing again.

P.S. I don’t know how to submit a bug report to the GCC team. I can’t reproduce the problem on a small simple application and I can’t hand them my project because it is a proprietary software and I’m NDA-ed to not distribute it.

Advertisement