Skip to content
Advertisement

Bash while read loop extremely slow compared to cat, why?

A simple test script here:

while read LINE; do
        LINECOUNT=$(($LINECOUNT+1))
        if [[ $(($LINECOUNT % 1000)) -eq 0 ]]; then echo $LINECOUNT; fi
done

When I do cat my450klinefile.txt | myscript the CPU locks up at 100% and it can process about 1000 lines a second. About 5 minutes to process what cat my450klinefile.txt >/dev/null does in half a second.

Is there a more efficient way to do essentially this. I just need to read a line from stdin, count the bytes, and write it out to a named pipe. But the speed of even this example is impossibly slow.

Every 1Gb of input lines I need to do a few more complex scripting actions (close and open some pipes that the data is being feed to).

Advertisement

Answer

The reason while read is so slow is that the shell is required to make a system call for every byte. It cannot read a large buffer from the pipe, because the shell must not read more than one line from the input stream and therefore must compare each character against a newline. If you run strace on a while read loop, you can see this behavior. This behavior is desirable, because it makes it possible to reliably do things like:

while read size; do test "$size" -gt 0 || break; dd bs="$size" count=1 of=file$(( i++ )); done

in which the commands inside the loop are reading from the same stream that the shell reads from. If the shell consumed a big chunk of data by reading large buffers, the inner commands would not have access to that data. An unfortunate side-effect is that read is absurdly slow.

Advertisement