I’m writing a ray tracer.
Recently, I added threading to the program to exploit the additional cores on my i5 Quad Core.
In a weird turn of events the debug version of the application is now running slower, but the optimized build is running faster than before I added threading.
I’m passing the “-g -pg” flags to gcc for the debug build and the “-O3” flag for the optimized build.
Host system: Ubuntu Linux 10.4 AMD64.
I know that debug symbols add significant overhead to the program, but the relative performance has always been maintained. I.e. a faster algorithm will always run faster in both debug and optimization builds.
Any idea why I’m seeing this behavior?
Debug version is compiled with “-g3 -pg”. Optimized version with “-O3”.
Optimized no threading: 0m4.864s Optimized threading: 0m2.075s Debug no threading: 0m30.351s Debug threading: 0m39.860s Debug threading after "strip": 0m39.767s Debug no threading (no-pg): 0m10.428s Debug threading (no-pg): 0m4.045s
This convinces me that “-g3” is not to blame for the odd performance delta, but that it’s rather the “-pg” switch. It’s likely that the “-pg” option adds some sort of locking mechanism to measure thread performance.
Since “-pg” is broken on threaded applications anyway, I’ll just remove it.
Advertisement
Answer
What do you get without the -pg
flag? That’s not debugging symbols (which don’t affect the code generation), that’s for profiling (which does).
It’s quite plausible that profiling in a multithreaded process requires additional locking which slows the multithreaded version down, even to the point of making it slower than the non-multithreaded version.