I am trying to remove duplicates based on the value of the 2nd field. The lower value of the 2nd field should be retained, any line with repeated 1st field and higher 2nd field should be removed.
This is an example of my raw data:
1234 2 ABCD 3234 1 DEFG 1234 1 DEFG
Here is how it should be:
1234 1 DEFG 3234 1 DEFG
So far, based on this post: I came up with this script:
awk '{ if($1 in a){ if($2 < a[$1]){ a[$1]= $2; r[$1]=$0; } else { a[$1]=$2; r[$1]=$0; } } } end {for(x in r) print r[x]}'
But it returns with no results.
I am still learning how to use awk, particularly the associate array.
Any help is welcome. Thanks in advance!
Advertisement
Answer
You can use this awk:
awk '!($1 in a) || $2 < a[$1] {a[$1]=$2; r[$1]=$0} END {for (i in r) print r[i]}' file 1234 1 DEFG 3234 1 DEFG