Skip to content
Advertisement

Awk average of column by moving difference of grouping column variable

I have a file that look like this:

1 snp1 0.0 4
1 snp2 0.2 6
1 snp3 0.3 4
1 snp4 0.4 3
1 snp5 0.5 5
1 snp6 0.6 6
1 snp7 1.3 5
1 snp8 1.3 3
1 snp9 1.9 4

File is sorted by column 3. I want the average of 4th column grouped by column 3 every 0.5 unit apart. For example it should output like this:

1 snp1 0.0 4.4
1 snp6 0.6 6.0
1 snp7 1.3 4.0
1 snp9 1.9 4.0

I can print all positions without average like this:

awk 'NR==1 {pos=$3; print $0} $3>=pos+0.5{pos=$3; print $0}' input

But I am not able to figure out how to print average of 4th column. It would be great if someone can help me to find solution to this problem. Thanks!

Advertisement

Answer

Something like this, maybe:

awk '
  NR==1 {c1=$1; c2=$2; v=$3; n=1; s=$4; next}
  $3>v+0.5 {print c1, c2, v, s/n; c1=$1; c2=$2; v=$3; n=1; s=$4; next}
  {n+=1; s+=$4}
  END {print c1, c2, v, s/n}
' input
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement