Skip to content
Advertisement

Sorting a file using fields with specific value

Recently, I had to sort several files according to records’ ID; the catch was that there can be several types of records, and in each of those the field I had to use for sorting is on a different position. The fields, however, are easily identifiable thanks to key=value structure. To show a simple sample of the general structure:

fieldA=valueA|fieldB=valueB|recordType=A|id=2|fieldC=valueC
fieldD=valueD|recordType=B|id=1|fieldE=valueE
fieldF=valueF|fieldG=valueG|fieldH=valueH|recordType=C|id=3

I came up with a pipeline as follows, which did the job:

awk -F'[|=]' '{for(i=1; i<=NF; i++) {if($i ~ "id") {i++; print $i"?"$0} }}' tester.txt | sort -n | awk -F'?' '{print $2}'

In other words the algorithm is as follows:

  1. Split the record by both field and key-value separators (| and =)
  2. Iterate through the elements and search for the id key
  3. Print the next element (value of id key), a separator, and the whole line
  4. Sort numerically
  5. Remove prepended identifier to preserve records’ structure

Processing the sample gives the output:

fieldD=valueD|recordType=B|id=1|fieldE=valueE
fieldA=valueA|fieldB=valueB|recordType=A|id=2|fieldC=valueC
fieldF=valueF|fieldG=valueG|fieldH=valueH|recordType=C|id=3

Is there a way, though, to do this task using single awk command?

Advertisement

Answer

You may try this gnu-awk code to to this in a single command:

awk -F'|' '{
   for(i=1; i<=NF; ++i)
      if ($i ~ /^id=/) {
         a[gensub(/^id=/, "", 1, $i)] = $0
         break
      }
}
END {
   PROCINFO["sorted_in"] = "@ind_num_asc"
   for (i in a)
      print a[i]
}' file

fieldD=valueD|recordType=B|id=1|fieldE=valueE
fieldA=valueA|fieldB=valueB|recordType=A|id=2|fieldC=valueC
fieldF=valueF|fieldG=valueG|fieldH=valueH|recordType=C|id=3

We are using | as field delimiter and when there is a column name starting with id= we store it in array a with index as text after = and value as the full record.

Using PROCINFO["sorted_in"] = "@ind_num_asc" we sort array a using numerical value of index and then in for loop we print value part to get the sorted output.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement