how to count repeated sentence in Shell

cat file1.txt
abc bcd abc ...
abcd bcde cdef ...
abcd bcde cdef ...
abcd bcde cdef ...
efg fgh ...
efg fgh ...
hig ...

JavaScript
​x
 
cat file1.txtabc bcd abc ...abcd bcde cdef ...abcd bcde cdef ...abcd bcde cdef ...efg fgh ...efg fgh ...hig ...​

My expected result is like as below:

abc bcd abc ...      

abcd bcde cdef ...  
<!!! pay attention, above sentence has repeated 3 times !!!>

efg fgh ...
<!!! pay attention, above sentence has repeated 3 times !!!>

hig ...

JavaScript
 
abc bcd abc ...      ​abcd bcde cdef ...  <!!! pay attention, above sentence has repeated 3 times !!!>​efg fgh ...<!!! pay attention, above sentence has repeated 3 times !!!>​hig ...​

I have found a way to deal with the issues, but my code is a little noisy.

cat file1.txt | uniq -c | sed -e 's/ +/ /g' -e 's/^.//g' | awk '{print $0," ",$1}'| sed -e 's/^[2-9] /n/g' -e 's/^[1] //g' |sed -e 's/[^1]$/n<!!! pay attention, above sentence has repeated & times !!!> n/g' -e 's/[1]$//g'

abc bcd abc ...

abcd bcde cdef ...
<!!! pay attention, above sentence has repeated 3 times !!!>

efg fgh ...
<!!! pay attention, above sentence has repeated 2 times !!!>

hig ...

JavaScript
 
cat file1.txt | uniq -c | sed -e 's/ +/ /g' -e 's/^.//g' | awk '{print $0," ",$1}'| sed -e 's/^[2-9] /n/g' -e 's/^[1] //g' |sed -e 's/[^1]$/n<!!! pay attention, above sentence has repeated & times !!!> n/g' -e 's/[1]$//g'​abc bcd abc ...​abcd bcde cdef ...<!!! pay attention, above sentence has repeated 3 times !!!>​efg fgh ...<!!! pay attention, above sentence has repeated 2 times !!!>​hig ...​

I was wondering if you could show me more high-efficiency way to achieve the goal.Thanks a lot.

Answer

If you’re lines are not already grouped, then you could use

awk '
    NR == FNR {count[$0]++; next} 
    !seen[$0]++ {
        print
        if (count[$0] > 1)
            print "... repeated", count[$0], "times"
    }
' file1.txt file1.txt

JavaScript
 
awk '    NR == FNR {count[$0]++; next}     !seen[$0]++ {        print        if (count[$0] > 1)            print "... repeated", count[$0], "times"    }' file1.txt file1.txt​

This will consume a lot of memory if your file is very large. You might want to sort it first.

Advertisement

Answer