Skip to content
Advertisement

How to efficiently loop through the lines of a file in Bash?

I have a file example.txt with about 3000 lines with a string in each line. A small file example would be:

JavaScript

I want to check all repeated lines in this file and output them. The desired output would be:

JavaScript

I made a script checkRepetions.sh:

JavaScript

However this script is very slow, it takes more than 10 minutes to run. In python it takes less than 5 seconds… I tried to store the file in memory by doing lines=$(cat example.txt) and doing line1=$(cat $lines | cut -d',' -f$i) but this is still very slow…

Advertisement

Answer

See why-is-using-a-shell-loop-to-process-text-considered-bad-practice for some of the reasons why your script is so slow.

JavaScript

To give you an idea of the performance difference using a bash script that’s written to be as efficient as possible and an equivalent awk script:

bash:

JavaScript

awk:

JavaScript

There are no differences in the output of both scripts:

JavaScript

The above is using 3rd-run timing to avoid caching issues and being tested against a file generated by the following awk script:

JavaScript

When the input file had zero duplicate lines (generated by seq 100000 > nodups100k) the bash script executed in about the same amount of time as it did above while the awk script executed much faster than it did above:

JavaScript
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement