Sample file
aabbccddeeffgghhiijj aabbccddeellgghhiijj aabbccddeeffgghhiijj
I want to skip 6th field ‘ff’ when comparing for an unique line, also I want the count of # of duplicate lines in front.
I tried this, without any luck:
sort -t'' -k1,5 -k7 --unique xslin1 > xslout
Expected output
3 aabbccddee*gghhiijj
Advertisement
Answer
$ awk -F'' -v OFS='' '{$6="*"} 1' xslin1 | sort | uniq -c
3 aabbccddee*gghhiijj
Discussion
With --unique, sort outputs only unique lines but it does not count them. One needs uniq -c for that. Further, sort outputs all unique lines, not just those that sort to the same value.
The above solution does the simple approach of assigning the sixth field to *, as you wanted in the output, and then uses the standard pipeline, sort | uniq -c, to produce the count of unique lines.