Here is a description of my situation: I have to take care of the bug in our product. The thread is created as joinable
, it must do its work, terminate and nobody will call pthread_join()
for it. So the thread is created with JOINABLE attribute (by default) and before termination it calls the next code:
{ pthread_detach(pthread_self()); pthread_exit(NULL); }
It works like a charm on all 32 bit linux distros I met, but it causes SIGSEGV
on 64 bit distros (Ubuntu 13.04 x86_64 and Debian). I didn’t try with Slackware. Here is a core:
Core was generated by `IsaVM -s=1 -PrjPath="/home/taf/Linux_Fov_540148/Cmds" -stgMode=1 -PR -Failover'. Program terminated with signal 11, Segmentation fault. #0 0x00007f5911a7c009 in pthread_detach () from /lib/x86_64-linux-gnu/libpthread.so.0 (gdb) bt #0 0x00007f5911a7c009 in pthread_detach () from /lib/x86_64-linux-gnu/libpthread.so.0 #1 0x000000000041310d in _kerCltDownloadThr (StartParams=0x6bfce0 <RESFOV>) at ./dker0clt.c:1258 #2 0x00007f5911a7ae9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #3 0x00007f591159f3fd in clone () from /lib/x86_64-linux-gnu/libc.so.6 #4 0x0000000000000000 in ?? ()
I figured out how to fix this bug – I set CREATE_DETACHABLE attribute (with pthread_attr_setdetachstate()
) for the thread before it is created and it works as expected.
But my question – is it a crime to call this code?
{ pthread_detach(pthread_self()); pthread_exit(NULL); }
Does pthread_detach()
do something asynchronously after call and that causes pthread_exit()
to bring problems? But the crash point is pthread_detach()
not pthread_exit()
! I don’t understand the reason for this crash completely! Why does it work on 32 bits? Is it a race condition somewhere in the pthread
implementation?
pthread_join()
doesn’t called for this thread.
Thanks in advance for any ideas.
Advertisement
Answer
I finished my research with approaches offered by a respectable @MaximYegorushkin. AddressSanitizer
shows me one buffer obverflow in our product but it isn’t related to my problem (I will definitely fix it later, it is always good to have such a wise tool to hunt the bugs). So decided to override all necessary pthread_xxx
functions with LD_PRELOAD
method. I run a simple test to be sure my library works as expected:
[HACK] Loading pthread hack. Starting thread...! [HACK] pthread_create: thread=7FAC6C86D700 Waiting for 2 seconds... [HACK] pthread_self: thread=7FAC6C86D700 thread_func: thread id = 7FAC6C86D700 Thread: sin(3.26) = -0.121109 [HACK] pthread_self: thread=7FAC6C86D700 [HACK] pthread_detach: thread=7FAC6C86D700 Terminating...
All strings started from [HACK] are produced by my threadhack.so
library.
Then I run my project with this library it points me exactly where the problem is:
Code executed: { pthread_detach(pthread_self()); pthread_exit(NULL); }
Debug traces:
[HACK] pthread_create: thread=7F403251CB00 ..... [HACK] pthread_self: thread=7F403251CB00 [HACK] pthread_detach: thread=3251CB00
So we see that pthread_self
returns a good thread id, but pthread_detach
received it already mangled (cut to 32 bit). How could this be? I generated assembler code for both my simple working test application as a reference and for my project:
Reference application:
call pthread_self movq %rax, %rdi call pthread_detach movl $0, %edi call pthread_exit
So we see here that movq
instruction is used to copy 64 bit thread id (movq %rax, %rdi
). OK, check what GCC generated for my project:
movl $0, %eax call pthread_self movl %eax, %edi movl $0, %eax call pthread_detach movl $0, %edi movl $0, %eax call pthread_exit
Woa! We have two movl
instructions (32 bit), one copies the least significant 32 bits (movl %eax, %edi
) and instead of most significan part it always put zero! (movl $0, %eax
). So this is a reason for the mangled thead id. I have no idea why the code is so different – compilation flags are the same. I saw this bug in GCC 4.7
I see this bug in GCC 4.8
(Latest package from the Ubuntu 13.10 x86_64
).
So at least now I see what hapenning. Thanks to @Maxim and brilliant tools. I learned a new thing again.
P.S. I don’t know how to submit a bug report to the GCC team. I can’t reproduce the problem on a small simple application and I can’t hand them my project because it is a proprietary software and I’m NDA-ed to not distribute it.