Skip to content
Advertisement

Looking for an accurate way to micro benchmark small code paths written in C++ and running on Linux/OSX

I’m looking to do some very basic micro benchmarking of small code paths, such as tight loops, that I’ve written in C++. I’m running on Linux and OSX, and using GCC. What facilities are there for sub millisecond accuracy? I am thinking a simple test of running the code path many times (several tens of millions?) will give me enough consistency to get a good reading. If anyone knows of preferable methods, please feel free to suggest them.

Advertisement

Answer

You can use "rdtsc" processor instruction on x86/x86_64. For multicore systems check the “constant_tsc” capability in CPUID (/proc/cpuinfo in linux) – it will mean that all cores use the same tick counter, even with dynamic freq changing and sleeping.

If your processor does not support constant_tsc, be sure to bind you programm to the core (taskset utility in Linux).

When using rdtsc on out-of-order CPUs (All besides Intel Atom, may be some other low-end cpus), add an “ordering” instruction before, e.g. “cpuid” – it will temporary disable instruction reordering.

Also, MacOsX has “Shark” which can measure some hardware events in your code.

RDTSC and out-of-order CPUs. More info in section 18 of the 2nd great Fog’s manual on optimization: Optimizing subroutines in assembly language: An optimization guide for x86 platforms (the main site with all the five manuals is http://www.agner.org/optimize/)

http://www.scribd.com/doc/1548519/optimizing-assembly

On all processors with out-of-order execution, you have to insert XOR EAX,EAX / CPUID before and after each read of the counter in order to prevent it from executing in parallel with anything else. CPUID is a serializing instruction, which means that it flushes the pipeline and waits for all pending operations to finish before proceeding. This is very useful for testing purposes.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement