Skip to content
Advertisement

Tracking a program’s progress reading through a file?

Is there a way to figure out where in a file a program is reading from? It seems like might be doable with strace or dtrace?

To clarify the question and give motivation, say I have a 10GB log file and am counting the number of unique lines:

$ cat log.txt | sort | uniq | wc -l

Can I check where in the file cat is currently at, effectively giving the progress of the command? Using lsof, I can’t seem to get the offset of last file read, which I think is what would do the trick:

$ lsof log.txt
COMMAND   PID USER   FD   TYPE DEVICE    SIZE/OFF       NODE NAME
cat     16021 erik    3r   REG   0,22 13416118210 1078133219 

Edit: I apologize, the example I gave is too narrow and misses the point. Ideally, for an arbitrary program, I would like to see where in the file reads are occurring (regardless of pipe).

Advertisement

Answer

You can do what you want with the progress command. It shows the progress of coreutils tools such as cat or other programs in reading their file.

File and offset information is available in Linux in /proc/<PID>/fd and /proc/<PID>/fdinfo.

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement