Skip to content
Advertisement

Uniq skipping middle part of the line when comparing lines

Sample file

aabbccddeeffgghhiijj

aabbccddeellgghhiijj

aabbccddeeffgghhiijj

I want to skip 6th field ‘ff’ when comparing for an unique line, also I want the count of # of duplicate lines in front.

I tried this, without any luck:

sort -t'' -k1,5 -k7 --unique xslin1 > xslout

Expected output

3 aabbccddee*gghhiijj

Advertisement

Answer

$ awk -F'' -v OFS='' '{$6="*"} 1' xslin1 | sort | uniq -c
      3 aabbccddee*gghhiijj

Discussion

With --unique, sort outputs only unique lines but it does not count them. One needs uniq -c for that. Further, sort outputs all unique lines, not just those that sort to the same value.

The above solution does the simple approach of assigning the sixth field to *, as you wanted in the output, and then uses the standard pipeline, sort | uniq -c, to produce the count of unique lines.

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement