Here is a description of my situation: I have to take care of the bug in our product. The thread is created as joinable
, it must do its work, terminate and nobody will call pthread_join()
for it. So the thread is created with JOINABLE attribute (by default) and before termination it calls the next code:
{ pthread_detach(pthread_self()); pthread_exit(NULL); }
It works like a charm on all 32 bit linux distros I met, but it causes SIGSEGV
on 64 bit distros (Ubuntu 13.04 x86_64 and Debian). I didn’t try with Slackware. Here is a core:
Core was generated by `IsaVM -s=1 -PrjPath="/home/taf/Linux_Fov_540148/Cmds" -stgMode=1 -PR -Failover'.
Program terminated with signal 11, Segmentation fault.
#0 0x00007f5911a7c009 in pthread_detach () from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) bt
#0 0x00007f5911a7c009 in pthread_detach () from /lib/x86_64-linux-gnu/libpthread.so.0
#1 0x000000000041310d in _kerCltDownloadThr (StartParams=0x6bfce0 <RESFOV>) at ./dker0clt.c:1258
#2 0x00007f5911a7ae9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#3 0x00007f591159f3fd in clone () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x0000000000000000 in ?? ()
I figured out how to fix this bug – I set CREATE_DETACHABLE attribute (with pthread_attr_setdetachstate()
) for the thread before it is created and it works as expected.
But my question – is it a crime to call this code?
{ pthread_detach(pthread_self()); pthread_exit(NULL); }
Does pthread_detach()
do something asynchronously after call and that causes pthread_exit()
to bring problems? But the crash point is pthread_detach()
not pthread_exit()
! I don’t understand the reason for this crash completely! Why does it work on 32 bits? Is it a race condition somewhere in the pthread
implementation?
pthread_join()
doesn’t called for this thread.
Thanks in advance for any ideas.
Advertisement
Answer
I finished my research with approaches offered by a respectable @MaximYegorushkin. AddressSanitizer
shows me one buffer obverflow in our product but it isn’t related to my problem (I will definitely fix it later, it is always good to have such a wise tool to hunt the bugs). So decided to override all necessary pthread_xxx
functions with LD_PRELOAD
method. I run a simple test to be sure my library works as expected:
[HACK] Loading pthread hack.
Starting thread!
[HACK] pthread_create: thread=7FAC6C86D700
Waiting for 2 seconds
[HACK] pthread_self: thread=7FAC6C86D700
thread_func: thread id = 7FAC6C86D700
Thread: sin(3.26) = -0.121109
[HACK] pthread_self: thread=7FAC6C86D700
[HACK] pthread_detach: thread=7FAC6C86D700
Terminating
All strings started from [HACK] are produced by my threadhack.so
library.
Then I run my project with this library it points me exactly where the problem is:
Code executed: { pthread_detach(pthread_self()); pthread_exit(NULL); }
Debug traces:
[HACK] pthread_create: thread=7F403251CB00
..
[HACK] pthread_self: thread=7F403251CB00
[HACK] pthread_detach: thread=3251CB00
So we see that pthread_self
returns a good thread id, but pthread_detach
received it already mangled (cut to 32 bit). How could this be? I generated assembler code for both my simple working test application as a reference and for my project:
Reference application:
call pthread_self
movq %rax, %rdi
call pthread_detach
movl $0, %edi
call pthread_exit
So we see here that movq
instruction is used to copy 64 bit thread id (movq %rax, %rdi
). OK, check what GCC generated for my project:
movl $0, %eax
call pthread_self
movl %eax, %edi
movl $0, %eax
call pthread_detach
movl $0, %edi
movl $0, %eax
call pthread_exit
Woa! We have two movl
instructions (32 bit), one copies the least significant 32 bits (movl %eax, %edi
) and instead of most significan part it always put zero! (movl $0, %eax
). So this is a reason for the mangled thead id. I have no idea why the code is so different – compilation flags are the same. I saw this bug in GCC 4.7
I see this bug in GCC 4.8
(Latest package from the Ubuntu 13.10 x86_64
).
So at least now I see what hapenning. Thanks to @Maxim and brilliant tools. I learned a new thing again.
P.S. I don’t know how to submit a bug report to the GCC team. I can’t reproduce the problem on a small simple application and I can’t hand them my project because it is a proprietary software and I’m NDA-ed to not distribute it.