Skip to content
Advertisement

Why do C++ and strace disagree on how long the open() system call is taking?

I have a program which opens a large number of files. I am timing the execution of a C++ loop which literally just opens and closes the files using both a C++ timer and strace. Strangely the system time and the time logged by C++ (which agree with each other) are orders of magnitude larger than the time the time strace claims was spent in system calls. How can this be? I have put the source and output below.

This all came about because I found that my application was spending an unreasonable amount of time just to open files. To help me pin down the problem I wrote the following test code (for reference the file “files.csv” is just a list with one filepath per line):

#include <stdio.h>
#include...

using namespace std;

int main(){
  timespec start, end;
  ifstream fin("files.csv");
  string line;
  vector<string> files;
  while(fin >> line){
    files.push_back(line);
  }
  fin.close();

  clock_gettime(CLOCK_MONOTONIC, &start);
  for(int i=0; i<500; i++){
    size_t filedesc = open(files[i].c_str(), O_RDONLY);
    if(filedesc < 0) printf("error in open");
    if(close(filedesc)<0) printf("error in close");
  }
  clock_gettime(CLOCK_MONOTONIC, &end);
  printf("  %fs elapsedn", (end.tv_sec-start.tv_sec) + ((float)(end.tv_nsec - start.tv_nsec))/1000000000);
  return 0;
}

And here is what I get when I run it:

-bash$ time strace -ttT -c ./open_stuff
  5.162448s elapsed <------ Output from C++ code

% time     seconds  usecs/call     calls    errors syscall 
------ ----------- ----------- --------- --------- ----------------
 99.72    0.043820          86       508           open  <------output from strace
  0.15    0.000064           0       508           close
  0.14    0.000061           0       705           read
  0.00    0.000000           0         1           write
  0.00    0.000000           0         8           fstat
  0.00    0.000000           0        25           mmap
  0.00    0.000000           0        12           mprotect
  0.00    0.000000           0         3           munmap
  0.00    0.000000           0        52           brk
  0.00    0.000000           0         2           rt_sigaction
  0.00    0.000000           0         1           rt_sigprocmask
  0.00    0.000000           0         1         1 access
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         1           getrlimit
  0.00    0.000000           0         1           arch_prctl
  0.00    0.000000           0         3         1 futex
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         1           set_robust_list
------ ----------- ----------- --------- --------- ----------------
100.00    0.043945                  1834         2 total

real    0m5.821s  <-------output from time
user    0m0.031s
sys     0m0.084s

In theory the reported “elapsed” time from C++ should be the execution time of the the calls to open(2) plus the minimal overhead of executing a for loop 500 times. And yet the sum of the total time in open(2) and close(1) calls from strace is 99% shorter . I cannot figure out what is going on.

PS The difference between the C elapsed time and system time is due to the fact that files.csv actually contains tens of thousands of paths, which all get loaded.

Advertisement

Answer

Comparing elapsed time with execution time is like comparing apples with orange juice. (One of them is missing the pulp 🙂 ) To open a file, the system has to find and read the appropriate directory entry… and if the paths are deep, it might need to rrad a number of directory entries. If the entries are not cached, they will need to be read from disk, which will involve a disk seek. While the disk heads are moving, and while the sector is spinning around to where the disk heads are, the wall clock keeps ticking, but the CPU can be doing other stuff (if there is work to do.) So that counts as elapsed time — the inexorable clock ticks on — but not execution time.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement