I have a dataset with many missing values as double slash (//). Part of the data is
input.txt 30 // 10 40 23 44 // // 31 // 54 // and so on
I would like calculate the average in each 6 rows interval without considering the missing values. I was trying with this, but not getting what I need.
awk '{if ($1 -ne "//") sum += $1} (NR%6)==0 {print(sum/NR)}' input.txt
It is giving
24.5 19.33
But the answer should come
29.4 42.5
Advertisement
Answer
You need to modify the awk
a bit to obtain the output as
$ awk '!///{sum += $1; count++} NR%6==0{print sum/count;sum=count=0}'
Test
$ awk '!///{sum += $1; count++} NR%6==0{print sum/count;sum=count=0}' file 29.4 42.5
What it does?
!///{sum += $1; count++}
!///
pattern checks if the line doesn’t contain/
{sum += $1; count++}
when the line doesn’t contain/
this action is performed, which sums the column 1,$1
tosum
and increments acount
which essentially tells us how many digitsawk
has seen till the next print.
NR%6==0{print sum/count;sum=count=0}
When the number of records,NR
is multiple of6
, we print the average,sum/count
and resets thecount
andsum
variables.
Edit
To print the last set of lines which may be less than 6
in number, we can use an END
block as
$ awk '!///{sum += $1; count++} NR%6==0{print sum/count;sum=count=0} END{print sum/count}' file 29.4 42.5
END{print sum/count}'
TheEND
block will be updated when the file reaches its end.
Edit 2
Edge case when no numbers occur in the 6 lines, the above script can lead to divide by zero error. The print statement can be formatted to handle this case.
print count ? (sum/count) : count;sum=count=0}
- This is a basic ternary operator which checks if count is non zero, if so prints the divided value , else print
count
, 0
Test
$ awk '!///{sum += $1; count++} NR%6==0{print count ? (sum/count) : count;sum=count=0}' file 29.4 42.5 0