I have a gridded data file in column format as:
ifile.txt x y value 20.5 20.5 -4.1 21.5 20.5 -6.2 22.5 20.5 0.0 20.5 21.5 1.2 21.5 21.5 4.3 22.5 21.5 6.0 20.5 22.5 7.0 21.5 22.5 10.4 22.5 22.5 16.7
I would like to convert it to matrix format as:
ofile.txt 20.5 21.5 22.5 20.5 -4.1 1.2 7.0 21.5 -6.2 4.3 10.4 22.5 0.0 6.0 16.7
Where top 20.5 21.5 22.5
indicate y and side values indicate x and the inside values indicate the corresponding grid values.
I found a similar question here Convert a 3 column file to matrix format but the script is not working in my case.
The script is
awk '{ h[$1,$2] = h[$2,$1] = $3 } END { for(i=1; i<=$1; i++) { for(j=1; j<=$2; j++) printf h[i,j] OFS printf "n" } }' ifile
Advertisement
Answer
The following awk
script handles :
- any size of matrix
- no relation between row and column indices so it keeps track of them separately.
- If a certain row column index does not appear, the value will default to zero.
This is done in this way:
awk ' BEGIN{PROCINFO["sorted_in"] = "@ind_num_asc"} (NR==1){next} {row[$1]=1;col[$2]=1;val[$1" "$2]=$3} END { printf "%8s",""; for (j in col) { printf "%8.3f",j }; printf "n" for (i in row) { printf "%8.3f",i; for (j in col) { printf "%8.3f",val[i" "j] }; printf "n" } }' <file>
How does it work:
PROCINFO["sorted_in"] = "@ind_num_asc"
, states that all arrays are sorted numerically by index.(NR==1){next}
: skip the first line{row[$1]=1;col[$2]=1;val[$1" "$2]=$3}
, process the line by storing the row and column index and accompanying value.- The end statement does all the printing.
This outputs:
20.500 21.500 22.500 20.500 -4.100 1.200 7.000 21.500 -6.200 4.300 10.400 22.500 0.000 6.000 16.700
note: the usage of PROCINFO
is a gawk
feature.
However, if you make a couple of assumptions, you can do it much shorter:
- the file contains all possible entries, no missing values
- you do not want the indices of the rows and columns printed out:
- the indices are sorted in column-major-order
The you can use the following short versions:
sort -g <file> | awk '($1+0!=$1){next} ($1!=o)&&(NR!=1){printf "n"} {printf "%8.3f",$3; o=$1 }'
which outputs
-4.100 1.200 7.000 -6.200 4.300 10.400 0.000 6.000 16.700
or for the transposed:
awk '(NR==1){next} ($2!=o)&&(NR!=2){printf "n"} {printf "%8.3f",$3; o=$2 }' <file>
This outputs
-4.100 -6.200 0.000 1.200 4.300 6.000 7.000 10.400 16.700