I have a file with multiple columns of data.
I need to calculate the median and average of the two columns.
Input:
67 65 56 43 87 87 90 65 95 34 87 76 85 65 87 89 73 34 72 56 98 33 95 84 84 79
Desired Output:
67 65 AVERAGE MEDIAN 56 43 AVERAGE MEDIAN 87 87 AVERAGE MEDIAN 90 65 AVERAGE MEDIAN 95 34 AVERAGE MEDIAN 87 76 AVERAGE MEDIAN 85 65 AVERAGE MEDIAN 87 89 AVERAGE MEDIAN 73 34 AVERAGE MEDIAN 72 56 AVERAGE MEDIAN 98 33 AVERAGE MEDIAN 95 84 AVERAGE MEDIAN 84 79 AVERAGE MEDIAN
I have tried
cat master | awk 'BEGIN {c = 0; sum = 0;} $1 ~ /^[0-9]*(.[0-9]*)?$/ {a[c++] = $1; sum += $1;} END {avg = sum / c; if( (c % 2) == 1 ) {median = a[ int(c/2) ];} else {median = ( a[c/2] + a[c/2-1] ) / 2;} OFS="t"; print avg, median;}' master
Which is good for only one column of data.
Advertisement
Answer
Here I’m assuming you meant to work with columns and calculate two sets of values. Your output format is not reflecting this assumption though.
It’s better to write functions for this task. However, your implementation is not correct to begin with. For median you have to sort the values first. Also the mid point calculation is not correct.
{ c1[NR]=$1 c2[NR]=$2 } END { print mean(c1) FS mean(c2) print median(c1) FS median(c2) } function mean(arr) { for(i in arr) {sum += arr[i]; k++} return sum / k } function median(arr) { n=asort(arr) if(n%2) { mid = (n+1)/2 return arr[mid] } else { mid = n/2 return (arr[mid]+arr[mid+1])/2 } }