I am trying to remove duplicates based on the value of the 2nd field. The lower value of the 2nd field should be retained, any line with repeated 1st field and higher 2nd field should be removed.
This is an example of my raw data:
JavaScript
x
1234 2 ABCD
3234 1 DEFG
1234 1 DEFG
Here is how it should be:
JavaScript
1234 1 DEFG
3234 1 DEFG
So far, based on this post: I came up with this script:
JavaScript
awk '{
if($1 in a){
if($2 < a[$1]){
a[$1]= $2;
r[$1]=$0;
} else {
a[$1]=$2;
r[$1]=$0;
}
}
} end {for(x in r) print r[x]}'
But it returns with no results.
I am still learning how to use awk, particularly the associate array.
Any help is welcome. Thanks in advance!
Advertisement
Answer
You can use this awk:
JavaScript
awk '!($1 in a) || $2 < a[$1] {a[$1]=$2; r[$1]=$0} END {for (i in r) print r[i]}' file
1234 1 DEFG
3234 1 DEFG