I have 1 text file, which is test1.txt.
text1.txt contain as following:
Input:
##[A1] [B1] [T1] [V1] [T2] [V2] [T3] [V3] [T4] [V4]## --> headers 1 1000 0 100 10 200 20 300 30 400 40 500 50 600 60 700 70 800 1010 0 101 10 201 20 301 30 401 40 501 50 601 2 1000 0 110 15 210 25 310 35 410 45 510 55 610 65 710 1010 0 150 10 250 20 350 30 450 40 550
Condition:
A1 and B1 -> for each A1 + (B1 + [Tn + Vn])
A1 should be in 1 column.
B1 should be in 1 column.
T1,T2,T3 and T4 should be in 1 column.
V1,V2,V3 and V4 should be in 1 column.
How do I sort it become like below?
Desire Output:
## A1 B1 Tn Vn ## --> headers 1 1000 0 100 10 200 20 300 30 400 40 500 50 600 60 700 70 800 1010 0 101 10 201 20 301 30 401 40 501 50 601 2 1000 0 110 15 210 25 310 35 410 45 510 55 610 65 710 1010 0 150 10 250 20 350 30 450 40 550
Here is my current code:
First Attempt:
Input
cat test1.txt | awk ' { a=$1 b=$2 } { for(i=1; i<=5; i=i+1) { t=substr($0,11+i*10,5) v=substr($0,16+i*10,5) if( t ~ /^ +[0-9]+$/ || t ~ /^[0-9]+$/ || t ~ /^ +[0-9]+ +$/ ){ printf "%7s %7d %8d %8d n",a,b,t,v } }}' | less
Output:
1 1000 400 0 40 500 800 0 1010 0 401 0 2 1000 410 0 1010 0 450 0
I’m trying using simple awk command, but still can’t get the result.
Can anyone help me on this?
Thanks,
Am
Advertisement
Answer
This is a rather tricky problem that can be handled a number of ways. Whether bash
, perl
or awk
, you will need to handle to number of fields in some semi-generic way instead of just hardcoding values for your example.
Using bash, so long as you can rely on an even-number of fields in all lines (except for the lines with the sole initial value (e.g. 1010
), you can accommodate the number of fields is a reasonably generic way. For the lines with 1, 2
, etc.. you know your initial output will contain 4-fields
. For lines with 1010
, etc.. you know the output will contain an initial 3-fields
. For the remaining values you are simply outputting pairs.
The tricky part is handling the alignment. Here is where printf
which allows you to set the field-width with a parameter using the form "%*s"
where the conversion specifier expects the next parameter to be an integer
value specifying the field-width followed by a parameter for the string conversion itself. It takes a little gymnastics, but you could do something like the following in bash itself:
(note: edit to match your output header format)
#!/bin/bash declare -i nfields wd=6 ## total no. fields, printf field-width modifier while read -r line; do ## read each line (preserve for header line) arr=($line) ## separate into array first=${arr[0]} ## check for '#' in first line for header if [ "${first:0:1}" = '#' ]; then nfields=$((${#arr[@]} - 2)) ## no. fields in header printf "## A1 B1 Tn Vn ## --> headersn" ## new header continue fi fields=${#arr[@]} ## fields in line case "$fields" in $nfields ) ## fields -eq nfiles? cnt=4 ## handle 1st 4 values in line printf " " for ((i=0; i < cnt; i++)); do if [ "$i" -eq '2' ]; then printf "%*s" "5" "${arr[i]}" else printf "%*s" "$wd" "${arr[i]}" fi done echo for ((i = cnt; i < $fields; i += 2)); do ## handle rest printf "%*s%*s%*sn" "$((2*wd))" " " "$wd" "${arr[i]}" "$wd" "${arr[$((i+1))]}" done ;; $((nfields - 1)) ) ## one less than nfields cnt=3 ## handle 1st 3 values printf " %*s%*s" "$wd" " " for ((i=0; i < cnt; i++)); do if [ "$i" -eq '1' ]; then printf "%*s" "5" "${arr[i]}" else printf "%*s" "$wd" "${arr[i]}" fi done echo for ((i = cnt; i < $fields; i += 2)); do ## handle rest if [ "$i" -eq '0' ]; then printf "%*s%*s%*sn" "$((wd+1))" " " "$wd" "${arr[i]}" "$wd" "${arr[$((i+1))]}" else printf "%*s%*s%*sn" "$((2*wd))" " " "$wd" "${arr[i]}" "$wd" "${arr[$((i+1))]}" fi done ;; * ) ## all other lines format as pairs for ((i = 0; i < $fields; i += 2)); do printf "%*s%*s%*sn" "$((2*wd))" " " "$wd" "${arr[i]}" "$wd" "${arr[$((i+1))]}" done ;; esac done
Rather than reading from a file, just use redirection to redirect the input file to your script (if you want to just provide a filename, then redirect the file to feed the output while read...
loop)
Example Use/Output
$ bash text1format.sh <dat/text1.txt ## A1 B1 Tn Vn ## --> headers 1 1000 0 100 10 200 20 300 30 400 40 500 50 600 60 700 70 800 1010 0 101 10 201 20 301 30 401 40 501 50 601 2 1000 0 110 15 210 25 310 35 410 45 510 55 610 65 710 1010 0 150 10 250 20 350 30 450 40 550
As between awk
and bash
, awk
will generally be faster, but here with formatted output, it may be closer than usual. Look things over and let me know if you have questions.