cat file1.txt abc bcd abc ... abcd bcde cdef ... abcd bcde cdef ... abcd bcde cdef ... efg fgh ... efg fgh ... hig ...
My expected result is like as below:
abc bcd abc ... abcd bcde cdef ... <!!! pay attention, above sentence has repeated 3 times !!!> efg fgh ... <!!! pay attention, above sentence has repeated 3 times !!!> hig ...
I have found a way to deal with the issues, but my code is a little noisy.
cat file1.txt | uniq -c | sed -e 's/ +/ /g' -e 's/^.//g' | awk '{print $0," ",$1}'| sed -e 's/^[2-9] /n/g' -e 's/^[1] //g' |sed -e 's/[^1]$/n<!!! pay attention, above sentence has repeated & times !!!> n/g' -e 's/[1]$//g' abc bcd abc ... abcd bcde cdef ... <!!! pay attention, above sentence has repeated 3 times !!!> efg fgh ... <!!! pay attention, above sentence has repeated 2 times !!!> hig ...
I was wondering if you could show me more high-efficiency way to achieve the goal.Thanks a lot.
Advertisement
Answer
If you’re lines are not already grouped, then you could use
awk ' NR == FNR {count[$0]++; next} !seen[$0]++ { print if (count[$0] > 1) print "... repeated", count[$0], "times" } ' file1.txt file1.txt
This will consume a lot of memory if your file is very large. You might want to sort it first.