Skip to content
Advertisement

How can I sum only a column range using awk?

I have this file :

mm1 17601901 17630939 -1.3
mm1 20802968 20820312 1.1
mm1 24612407 24612700 -1.1
mm1 34459762 34475733 1.8

And I’am trying to sum the values in the forth column based in a range for example each 1 million using the second column as reference.

I’am using this code karakfa taught me:

awk '{a[$1 FS 1000000*int(($2-1)/1000000)+500000];sum+=$4} END{for(k in a) print k,a[k],sum}'

it outputs me this :

mm1 20500000  2150.1
mm1 34500000  2150.1
mm1 36500000  2150.1
mm1 37500000  2150.1

but it sums all column in the file, not only the range I’ve created.

if I use ++ instead, it sums the number of instances in that range :

awk '{a[$1 FS 1000000*int(($2-1)/1000000)+500000];++} END{for(k in a) print k,a[k],sum}'

what Iam doing wrong?

Advertisement

Answer

Your sum needs to be the array (one element for each range):

awk '{sum[$1 FS 1000000*int(($2-1)/1000000)+500000]+=$4} END{for(k in sum) print k,sum[k]}'
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement