Looking for an accurate way to micro benchmark small code paths written in C++ and running on Linux/OSX

Question

I'm looking to do some very basic micro benchmarking of small code paths, such as tight loops, that I've written in C++. I'm running on Linux and OSX, and using GCC. What facilities are there for sub millisecond accuracy? I am thinking a simple test of running the code path many times (several tens of millions?) will give me enough

Accepted Answer

You can use "rdtsc" processor instruction on x86/x86_64. For multicore systems check the &#8220;constant_tsc&#8221; capability in CPUID (/proc/cpuinfo in linux) &#8211; it will mean that all cores use the same tick counter, even with dynamic freq changing and sleeping.If your processor does not support constant_tsc, be sure to bind you programm to the core (taskset utility in Linux).When using rdtsc on out-of-order CPUs (All besides Intel Atom, may be some other low-end cpus), add an &#8220;ordering&#8221; instruction before, e.g. &#8220;cpuid&#8221; &#8211; it will temporary disable instruction reordering.Also, MacOsX has &#8220;Shark&#8221; which can measure some hardware events in your code.RDTSC and out-of-order CPUs. More info in section 18 of the 2nd great Fog&#8217;s manual on optimization: Optimizing subroutines in assembly language: An optimization guide for x86 platforms (the main site with all the five manuals is http://www.agner.org/optimize/) http://www.scribd.com/doc/1548519/optimizing-assembly  On all processors with out-of-order execution, you have to insert XOR EAX,EAX / CPUID  before and after each read of the counter in order to prevent it from executing in parallel  with anything else. CPUID is a serializing instruction, which means that it flushes the  pipeline and waits for all pending operations to finish before proceeding. This is very useful  for testing purposes.

Advertisement

Answer