Skip to content
Advertisement

Awk script to sum multiple column if value in column1 is duplicate

Need your help to resolve the below query. I want to sum up the values for column3,column5,column6, column7,column9,column10 if value in column1 is duplicate. Also need to make duplicate rows as single row in output file and also need to put the value of column1 in column 8 in output file

input file

a|b|c|d|e|f|g|h|i|j
IN27201800024099|a|2.01|ad|5|56|6|rr|1|5
IN27201800023963|b|3|4|rt|67|6|61|ty|6
IN27201800024099|a|4|87|ad|5|6|1|rr|7.45
IN27201800024099|a|5|98|ad|5|6|1|rr|8
IN27201800023963|b|7|7|rt|5|5|1|ty|56
IN27201800024098|f|80|67|ty|6|6|1|4rght|765

output file

a|b|c|d|e|f|g|h|i|j|k
IN27201800024099|a|11.01|190|ad|66|18|3|rr|20.45|IN27201800024099
IN27201800023963|b|10|11|rt|72|11|62|ty|62|IN27201800023963
IN27201800024098|f|80|67|ty|6|6|1|4rght|765|IN27201800024098

Tried below code, but it is not working and also no clue how to complete the code to get correct output

awk 'BEGIN {FS=OFS="|"} FNR==1 {a[$1]+= (f3[key]+=$3;f5[key]+=$5;f6[key]+=$6;f7[key]+=$7;f9[key]+=$9;f10[key]+=$10;)} input.txt > output.txt

Advertisement

Answer

$ cat tst.awk
BEGIN {
    FS=OFS="|"
}
NR==1 {
    print $0, "h"
    next
}
{
    keys[$1]
    for (i=2; i<=NF; i++) {
        sum[$1,i] += $i
    }
}
END {
    for (key in keys) {
        printf "%s", key
        for (i=2; i<=NF; i++) {
            printf "%s%s", OFS, sum[key,i]
        }
        print OFS key
    }
}

$ awk -f tst.awk file
a|b|c|d|e|f|g|h
IN27201800023963|10|11|72|11|62|62|IN27201800023963
IN27201800024098|80|67|6|0|1|765|IN27201800024098
IN27201800024099|11.01|190|66|18|3|20.45|IN27201800024099

The above outputs the lines in random order, if you want them output in the same order as the key values were read in, it’s just a couple more lines of code:

$ cat tst.awk
BEGIN {
    FS=OFS="|"
}
NR==1 {
    print $0, "h"
    next
}
!seen[$1]++ {
    keys[++numKeys] = $1
}
{
    for (i=2; i<=NF; i++) {
        sum[$1,i] += $i
    }
}
END {
    for (keyNr=1; keyNr<=numKeys; keyNr++) {
        key = keys[keyNr]
        printf "%s", key
        for (i=2; i<=NF; i++) {
            printf "%s%s", OFS, sum[key,i]
        }
        print OFS key
    }
}

$ awk -f tst.awk file
a|b|c|d|e|f|g|h
IN27201800024099|11.01|190|66|18|3|20.45|IN27201800024099
IN27201800023963|10|11|72|11|62|62|IN27201800023963
IN27201800024098|80|67|6|0|1|765|IN27201800024098
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement