Skip to content
Advertisement

AWK remove duplicate line based on two conditions

I am trying to remove duplicates based on the value of the 2nd field. The lower value of the 2nd field should be retained, any line with repeated 1st field and higher 2nd field should be removed.

This is an example of my raw data:

1234     2     ABCD
3234     1     DEFG
1234     1     DEFG

Here is how it should be:

1234     1   DEFG
3234     1   DEFG 

So far, based on this post: I came up with this script:

awk '{
    if($1 in a){
        if($2 < a[$1]){
            a[$1]= $2;
            r[$1]=$0;
        } else {
            a[$1]=$2;
            r[$1]=$0;
        }
    }
} end {for(x in r) print r[x]}'

But it returns with no results.

I am still learning how to use awk, particularly the associate array.

Any help is welcome. Thanks in advance!

Advertisement

Answer

You can use this awk:

awk '!($1 in a) || $2 < a[$1] {a[$1]=$2; r[$1]=$0} END {for (i in r) print r[i]}' file
1234     1     DEFG
3234     1     DEFG
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement