Skip to content
Advertisement

Find the average of multiple columns for each distinct variable in column 1

Hi I have a file with 6 columns and I wish to know the average of three of these (columns 2,3,4) and the sum of the last two (columns 5 and 6) for each unique variable in column one.

A1234 0.526 0.123 0.456 0.986 1.123
A1234 0.423 0.256 0.397 0.876 0.999
A1234 0.645 0.321 0.402 0.903 1.101
A1234 0.555 0.155 0.406 0.888 1.009
B5678 0.111 0.345 0.285 0.888 0.789
B5678 0.221 0.215 0.305 0.768 0.987 
B5678 0.336 0.289 0.320 0.789 0.921

I have come across code that will get the average for column 2 based on column one but is there anyway I can expand this across columns? Thanks

awk '{a[$1]+=$2; c[$1]++} END{for (i in a) printf "%d%s%.2fn", i, OFS, a[i]/c[i]}'

I would like the output to be in the following format ;each variable in column one will also have a different number of rows

A1234 0.53725 0.21375 0.41525 3.653 4.232
B5678 0.22233 0.283 0.30333 2.445 2.697

Advertisement

Answer

awk '{a[$1]+=$2;b[$1]+=$3;c[$1]+=$4;d[$1]+=$5;e[$1]+=$6;f[$1]++} END{for (i in a) print i,a[i]/f[i],b[i]/f[i],c[i]/f[i],d[i],e[i]}' file

O/p:

B5678 0.222667 0.283 0.303333 2.445 2.697
A1234 0.53725 0.21375 0.41525 3.653 4.232
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement