Skip to content
Advertisement

Using awk to parse and transform the following log

I have a log like this:

DEBUG: Worker thread (#12) initialized
DEBUG: Worker thread (#19) initialized
DEBUG: Worker thread (#9) initialized
DEBUG: Worker thread (#15) initialized
DEBUG: Worker thread (#3) initialized
DEBUG: Worker thread (#17) initialized
DEBUG: Worker thread (#14) initialized
DEBUG: Worker thread (#16) initialized
Threads started!

[ 5s ] thds: 20 tps: 35265.85 qps: 35265.85 (r/w/o: 0.00/35265.85/0.00) lat (ms,99%): 2.52 err/s: 0.00 reconn/s: 0.00
[ 10s ] thds: 20 tps: 35965.67 qps: 35965.67 (r/w/o: 0.00/35965.67/0.00) lat (ms,99%): 2.03 err/s: 0.00 reconn/s: 0.00
...

I want to parse this log file and get all the following lines:

[ 5s ] thds: 20 tps: 35265.85 qps: 35265.85 (r/w/o: 0.00/35265.85/0.00) lat (ms,99%): 2.52 err/s: 0.00 reconn/s: 0.00
[ 10s ] thds: 20 tps: 35965.67 qps: 35965.67 (r/w/o: 0.00/35965.67/0.00) lat (ms,99%): 2.03 err/s: 0.00 reconn/s: 0.00
....

Then I want to transform those lines into the following format for plotting:

5,35265.85
10,35965.67
...

Here is my awk code:

#!/usr/bin/env bash
awk '
BEGIN {
printf "#time,tpsn";
}
/^[ [0-9]{1,4}[s]? ]/ { # regex for [ 1050s ]
printf "%s,%sn", substr($2,1, length($2)-1), $7
}
' "$@"

The thing I don’t like for this solution is: I must count manually the index of tokens generated by awk. I prefer a better solution such as: “first token after string “tps””. This way, it will be more general and easier in parsing.

My question is: can I really do that using awk. Or are there any better solutions to handle my situation?

Advertisement

Answer

Is this what you’re trying to do?

$ awk -v OFS=',' '/^[/{print $2+0, $5, $7, $9}' file
5,20,35265.85,35265.85
10,20,35965.67,35965.67
15,20,35233.82,35233.82
20,20,35239.05,35239.25
25,20,37188.61,37188.41
30,20,36622.32,36622.32
35,20,36538.27,36538.27

or maybe this if you want headers:

awk -F'[ :]+' -v OFS=',' '/^[/{ if (!doneHdr++) print "time", $4, $6, $8; print $2+0, $5, $7, $9}' file
time,thds,tps,qps
5,20,35265.85,35265.85
10,20,35965.67,35965.67
15,20,35233.82,35233.82
20,20,35239.05,35239.25
25,20,37188.61,37188.41
30,20,36622.32,36622.32
35,20,36538.27,36538.27

or this:

$ awk -F'[ :]+' -v OFS=',' -v tgts='time thds tps qps' '
    BEGIN {
        numTags = split(tgts,tags)
        for (tagNr=1; tagNr<=numTags; tagNr++) {
            printf "%s%s", tags[tagNr], (tagNr<numTags ? OFS : ORS)
        }
    }
    /^[/ {
        for (i=1; i<=NF; i++) {
            f[$i] = $(i+1)
            sub(/[^0-9]+$/,"",f[$i])
        }
        f["time"] = f["["]

        for (tagNr=1; tagNr<=numTags; tagNr++) {
            printf "%s%s", f[tags[tagNr]], (tagNr<numTags ? OFS : ORS)
        }
    }
' file
time,thds,tps,qps
5,20,35265.85,35265.85
10,20,35965.67,35965.67
15,20,35233.82,35233.82
20,20,35239.05,35239.25
25,20,37188.61,37188.41
30,20,36622.32,36622.32
35,20,36538.27,36538.27

I ran the above using your original sample input:

$ cat file
DEBUG: Worker thread (#12) initialized
DEBUG: Worker thread (#19) initialized
DEBUG: Worker thread (#9) initialized
DEBUG: Worker thread (#15) initialized
DEBUG: Worker thread (#3) initialized
DEBUG: Worker thread (#17) initialized
DEBUG: Worker thread (#14) initialized
DEBUG: Worker thread (#16) initialized
Threads started!

[ 5s ] thds: 20 tps: 35265.85 qps: 35265.85 (r/w/o: 0.00/35265.85/0.00) lat (ms,99%): 2.52 err/s: 0.00 reconn/s: 0.00
[ 10s ] thds: 20 tps: 35965.67 qps: 35965.67 (r/w/o: 0.00/35965.67/0.00) lat (ms,99%): 2.03 err/s: 0.00 reconn/s: 0.00
[ 15s ] thds: 20 tps: 35233.82 qps: 35233.82 (r/w/o: 0.00/35233.82/0.00) lat (ms,99%): 2.26 err/s: 0.00 reconn/s: 0.00
[ 20s ] thds: 20 tps: 35239.05 qps: 35239.25 (r/w/o: 0.00/35239.25/0.00) lat (ms,99%): 2.11 err/s: 0.00 reconn/s: 0.00
[ 25s ] thds: 20 tps: 37188.61 qps: 37188.41 (r/w/o: 0.00/37188.41/0.00) lat (ms,99%): 1.86 err/s: 0.00 reconn/s: 0.00
[ 30s ] thds: 20 tps: 36622.32 qps: 36622.32 (r/w/o: 0.00/36622.32/0.00) lat (ms,99%): 1.96 err/s: 0.00 reconn/s: 0.00
[ 35s ] thds: 20 tps: 36538.27 qps: 36538.27 (r/w/o: 0.00/36538.27/0.00) lat (ms,99%): 2.00 err/s: 0.00 reconn/s: 0.00
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement