Sample file
aabbccddeeffgghhiijj aabbccddeellgghhiijj aabbccddeeffgghhiijj
I want to skip 6th field ‘ff’ when comparing for an unique line, also I want the count of # of duplicate lines in front.
I tried this, without any luck:
sort -t'' -k1,5 -k7 --unique xslin1 > xslout
Expected output
3 aabbccddee*gghhiijj
Advertisement
Answer
$ awk -F'' -v OFS='' '{$6="*"} 1' xslin1 | sort | uniq -c 3 aabbccddee*gghhiijj
Discussion
With --unique
, sort
outputs only unique lines but it does not count them. One needs uniq -c
for that. Further, sort
outputs all unique lines, not just those that sort to the same value.
The above solution does the simple approach of assigning the sixth field to *
, as you wanted in the output, and then uses the standard pipeline, sort | uniq -c
, to produce the count of unique lines.