Is there a way to figure out where in a file a program is reading from? It seems like might be doable with strace or dtrace?
To clarify the question and give motivation, say I have a 10GB log file and am counting the number of unique lines:
$ cat log.txt | sort | uniq | wc -l
Can I check where in the file cat
is currently at, effectively giving the progress of the command? Using lsof
, I can’t seem to get the offset of last file read, which I think is what would do the trick:
$ lsof log.txt COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME cat 16021 erik 3r REG 0,22 13416118210 1078133219
Edit: I apologize, the example I gave is too narrow and misses the point. Ideally, for an arbitrary program, I would like to see where in the file reads are occurring (regardless of pipe).
Advertisement
Answer
You can do what you want with the progress
command. It shows the progress of coreutils tools such as cat
or other programs in reading their file.
File and offset information is available in Linux in /proc/<PID>/fd
and /proc/<PID>/fdinfo
.