Skip to content
Advertisement

Calculate Median and Average of a Text FIle with Multiple Columns of Data

I have a file with multiple columns of data.

I need to calculate the median and average of the two columns.

Input:

67 65
56 43
87 87
90 65
95 34
87 76
85 65
87 89
73 34
72 56
98 33
95 84
84 79

Desired Output:

67 65 AVERAGE MEDIAN
56 43 AVERAGE MEDIAN
87 87 AVERAGE MEDIAN
90 65 AVERAGE MEDIAN
95 34 AVERAGE MEDIAN
87 76 AVERAGE MEDIAN
85 65 AVERAGE MEDIAN
87 89 AVERAGE MEDIAN
73 34 AVERAGE MEDIAN
72 56 AVERAGE MEDIAN
98 33 AVERAGE MEDIAN
95 84 AVERAGE MEDIAN
84 79 AVERAGE MEDIAN

I have tried

cat master | awk 'BEGIN {c = 0; sum = 0;} $1 ~ /^[0-9]*(.[0-9]*)?$/ {a[c++] = $1; sum += $1;} END {avg = sum / c; if( (c % 2) == 1 ) {median = a[ int(c/2) ];} else {median = ( a[c/2] + a[c/2-1] ) / 2;} OFS="t"; print avg, median;}' master

Which is good for only one column of data.

Advertisement

Answer

Here I’m assuming you meant to work with columns and calculate two sets of values. Your output format is not reflecting this assumption though.

It’s better to write functions for this task. However, your implementation is not correct to begin with. For median you have to sort the values first. Also the mid point calculation is not correct.

{
  c1[NR]=$1
  c2[NR]=$2
}
END {
  print mean(c1) FS mean(c2)
  print median(c1) FS median(c2)
}

function mean(arr) {
  for(i in arr) {sum += arr[i]; k++}
  return sum / k
}

function median(arr) {
  n=asort(arr)
  if(n%2) {
    mid = (n+1)/2
    return arr[mid]
  } else {
    mid = n/2
    return (arr[mid]+arr[mid+1])/2
  }
}
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement