What can I do to optimize this sort?
I am running:
mytime="$(time ( cat quotes_2009-04.txt | tr [:space:] 'n' | grep -v "^s*$" | sort | uniq -c ) 2>&1 1>/dev/null )"
and then:
echo "$mytime"
getting the following output:
real 134m0.107s user 105m27.274s sys 42m34.889s
here’s the datatset that I am using:
a preview of the original dataset:
here are the details on the vagrant machine:
What can I do to optimize this sort?
Advertisement
Answer
Split your data into several files, sort each file in parallel, then merge the files together. see here for example.