Bash while read loop extremely slow compared to cat, why?

Question

A simple test script here: When I do cat my450klinefile.txt | myscript the CPU locks up at 100% and it can process about 1000 lines a second. About 5 minutes to process what cat my450klinefile.txt >/dev/null does in half a second. Is there a more efficient way to do essentially this. I just need to read a line from stdin,

Accepted Answer

The reason while read is so slow is that the shell is required to make a system call for every byte.  It cannot read a large buffer from the pipe, because the shell must not read more than one line from the input stream and therefore must compare each character against a newline.  If you run strace on a while read loop, you can see this behavior.  This behavior is desirable, because it makes it possible to reliably do things like:while read size; do test "$size" -gt 0 || break; dd bs="$size" count=1 of=file$(( i++ )); donein which the commands inside the loop are reading from the same stream that the shell reads from.  If the shell consumed a big chunk of data by reading large buffers, the inner commands would not have access to that data.  An unfortunate side-effect is that read is absurdly slow.

Advertisement

Answer