Skip to content
Advertisement

how to count repeated sentence in Shell

cat file1.txt
abc bcd abc ...
abcd bcde cdef ...
abcd bcde cdef ...
abcd bcde cdef ...
efg fgh ...
efg fgh ...
hig ...

My expected result is like as below:

abc bcd abc ...      

abcd bcde cdef ...  
<!!! pay attention, above sentence has repeated 3 times !!!>

efg fgh ...
<!!! pay attention, above sentence has repeated 3 times !!!>

hig ...

I have found a way to deal with the issues, but my code is a little noisy.

cat file1.txt | uniq -c | sed -e 's/ +/ /g' -e 's/^.//g' | awk '{print $0," ",$1}'| sed -e 's/^[2-9] /n/g' -e 's/^[1] //g' |sed -e 's/[^1]$/n<!!! pay attention, above sentence has repeated & times !!!> n/g' -e 's/[1]$//g'

abc bcd abc ...

abcd bcde cdef ...
<!!! pay attention, above sentence has repeated 3 times !!!>

efg fgh ...
<!!! pay attention, above sentence has repeated 2 times !!!>

hig ...

I was wondering if you could show me more high-efficiency way to achieve the goal.Thanks a lot.

Advertisement

Answer

If you’re lines are not already grouped, then you could use

awk '
    NR == FNR {count[$0]++; next} 
    !seen[$0]++ {
        print
        if (count[$0] > 1)
            print "... repeated", count[$0], "times"
    }
' file1.txt file1.txt

This will consume a lot of memory if your file is very large. You might want to sort it first.

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement