Skip to content
Advertisement

Why does the Qemu run differ from the native run?

What did i do?
I ran qemu-x86_64 -singlestep -d nochain,cpu ./dummy to dump all the registers of a dummy program after each instruction and used grep to save all the RIP values into a text file (qemu_rip_dump.txt). I then singlestepped the dummy program with ptrace and dumped the RIP values after each instruction into another textfile (ptrace_rip_dump.txt). I then compared both .txt files with diff.

What result did i expect?
I expected both runs of the dummy program to execute the same instructions, thus both dump files being the same (same rip values and same amount of rip values).

What result did i actually get?
Ptrace dumped about 33.500 RIP values and Qemu dumped 29.800 RIP values. The RIP values of both textfiles start differing from the 240. instruction, most of the rip values are identical but ptrace executes about 5500 instructions qemu doesnt execute and qemu executes about 1800 instructions ptrace doesnt execute thus resulting in a difference of about 3700 instructions. Both runs seem to execute things differently throughout the whole program, for example there is a block of 3500 instructions from the 26.500-30.000 instruction (cleanup?) that the native run executes but not qemu.

What is my qestion
Why are the RIP values not the same throughout the whole execution of the program and most importantly: What do i have to do to make both runs be the same?

Extra Info

  • the dummy program was a main function that returns 0, but this problem exists in every executable i have traced
  • i have tried forcing qemu using the ld-linux-x86-64.so.2 linker with -L /lib64/ – this had no effect
  • if i run qemu multiple times the dumps are the same (equal number and value of RIP), the same goes for ptrace

Advertisement

Answer

With a “does nothing” program like the one you’re testing, most of the execution run will be in the guest dynamic linker and libc. Those do a lot of work behind the scenes before your program gets control, and some of that work varies between a “native” run and a “QEMU” run. There are two main sources of divergence, judging by some of the extra detail you give in the comments:

  1. The environment QEMU provides to the guest binary is not 100% identical to that which a real host kernel provides; it’s only intended to be “close enough that correct guest binaries behave in a reasonable way”. For instance, there is a data structure passed to the guest called the “ELF auxiliary vector”; this contains information including “what CPU features are supported”, “what user ID are you executing as”, and so on. The dynamic linker iterates through this data structure at startup, so minor harmless differences in what entries are in the vector in what order will cause slightly different execution paths in the guest code.

  2. The CPU QEMU emulates does not provide exactly the same features that your host CPU does. There’s no support for emulation of AVX or SSE2, for instance. The guest libc will adjust its behaviour so that it takes advantage of CPU features when they’re available, so it picks different optimised versions of functions like memcpy() or strlen() under the hood. Since the dynamic linker will end up calling these functions, this also results in divergences of execution.

You may be able to work around some of this by restricting the area of instruction tracing you look at to just starting from the beginning of the ‘main’ function to avoid tracing all of the dynamic linker startup. I can’t think of a way to work around the differences in what CPU features are available on the host vs QEMU, though.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement