I have this file :
mm1 17601901 17630939 -1.3 mm1 20802968 20820312 1.1 mm1 24612407 24612700 -1.1 mm1 34459762 34475733 1.8
And I’am trying to sum the values in the forth column based in a range for example each 1 million using the second column as reference.
I’am using this code karakfa taught me:
awk '{a[$1 FS 1000000*int(($2-1)/1000000)+500000];sum+=$4} END{for(k in a) print k,a[k],sum}'
it outputs me this :
mm1 20500000 2150.1 mm1 34500000 2150.1 mm1 36500000 2150.1 mm1 37500000 2150.1
but it sums all column in the file, not only the range I’ve created.
if I use ++ instead, it sums the number of instances in that range :
awk '{a[$1 FS 1000000*int(($2-1)/1000000)+500000];++} END{for(k in a) print k,a[k],sum}'
what Iam doing wrong?
Advertisement
Answer
Your sum
needs to be the array (one element for each range):
awk '{sum[$1 FS 1000000*int(($2-1)/1000000)+500000]+=$4} END{for(k in sum) print k,sum[k]}'