Skip to content
Advertisement

How to handle missing values while calculating average in shell script?

I have a dataset with many missing values as double slash (//). Part of the data is

input.txt
30
//
10
40
23
44
//
//
31
//
54
// and so on

I would like calculate the average in each 6 rows interval without considering the missing values. I was trying with this, but not getting what I need.

awk '{if ($1 -ne "//") sum += $1} (NR%6)==0 {print(sum/NR)}' input.txt

It is giving

24.5
19.33

But the answer should come

29.4
42.5

Advertisement

Answer

You need to modify the awk a bit to obtain the output as

$ awk '!///{sum += $1; count++} NR%6==0{print sum/count;sum=count=0}'

Test

$ awk '!///{sum += $1; count++} NR%6==0{print sum/count;sum=count=0}' file
29.4
42.5

What it does?

  • !///{sum += $1; count++}

    • !/// pattern checks if the line doesn’t contain /

    • {sum += $1; count++} when the line doesn’t contain / this action is performed, which sums the column 1, $1 to sum and increments a count which essentially tells us how many digits awk has seen till the next print.

  • NR%6==0{print sum/count;sum=count=0} When the number of records, NR is multiple of 6, we print the average, sum/count and resets the count and sum variables.

Edit

To print the last set of lines which may be less than 6 in number, we can use an END block as

$ awk '!///{sum += $1; count++} NR%6==0{print sum/count;sum=count=0} END{print sum/count}' file
29.4
42.5
  • END{print sum/count}' The END block will be updated when the file reaches its end.

Edit 2

Edge case when no numbers occur in the 6 lines, the above script can lead to divide by zero error. The print statement can be formatted to handle this case.

print count ? (sum/count) : count;sum=count=0}
  • This is a basic ternary operator which checks if count is non zero, if so prints the divided value , else print count, 0

Test

$ awk '!///{sum += $1; count++} NR%6==0{print count ? (sum/count) : count;sum=count=0}' file
29.4
42.5
0
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement