Recently, I had to sort several files according to records’ ID; the catch was that there can be several types of records, and in each of those the field I had to use for sorting is on a different position. The fields, however, are easily identifiable thanks to key=value
structure. To show a simple sample of the general structure:
fieldA=valueA|fieldB=valueB|recordType=A|id=2|fieldC=valueC fieldD=valueD|recordType=B|id=1|fieldE=valueE fieldF=valueF|fieldG=valueG|fieldH=valueH|recordType=C|id=3
I came up with a pipeline as follows, which did the job:
awk -F'[|=]' '{for(i=1; i<=NF; i++) {if($i ~ "id") {i++; print $i"?"$0} }}' tester.txt | sort -n | awk -F'?' '{print $2}'
In other words the algorithm is as follows:
- Split the record by both field and key-value separators (
|
and=
) - Iterate through the elements and search for the
id
key - Print the next element (value of
id
key), a separator, and the whole line - Sort numerically
- Remove prepended identifier to preserve records’ structure
Processing the sample gives the output:
fieldD=valueD|recordType=B|id=1|fieldE=valueE fieldA=valueA|fieldB=valueB|recordType=A|id=2|fieldC=valueC fieldF=valueF|fieldG=valueG|fieldH=valueH|recordType=C|id=3
Is there a way, though, to do this task using single awk
command?
Advertisement
Answer
You may try this gnu-awk
code to to this in a single command:
awk -F'|' '{ for(i=1; i<=NF; ++i) if ($i ~ /^id=/) { a[gensub(/^id=/, "", 1, $i)] = $0 break } } END { PROCINFO["sorted_in"] = "@ind_num_asc" for (i in a) print a[i] }' file fieldD=valueD|recordType=B|id=1|fieldE=valueE fieldA=valueA|fieldB=valueB|recordType=A|id=2|fieldC=valueC fieldF=valueF|fieldG=valueG|fieldH=valueH|recordType=C|id=3
We are using |
as field delimiter and when there is a column name starting with id=
we store it in array a
with index as text after =
and value as the full record.
Using PROCINFO["sorted_in"] = "@ind_num_asc"
we sort array a
using numerical value of index and then in for loop we print value part to get the sorted output.