Skip to content
Advertisement

How to use awk regex sort by query string value?

I have a log file with example row:

xxx.xxx.xxx.xxx - - [07/Jun/2015:14:18:39 +0000] "GET /file/?t=70 HTTP/1.1" 200 35 "http://1234.com/p/talk-about-owning-it/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome"

The 7th column of each row looks like this:

/file/?t=70
/file/?t=4785&k=1
/file/?t=120
/file/?t=95&k=0
/file/?t=120
/file/?t=120&k=0
/file/?t=95&k=1
...

Output is arranged according to the number of lines in decreasing order containing unique values of t.

Desired OUTPUT:

120  -  3
95   -  2
4785 -  1
70   -  1
...

I am using awk but it is not giving desired output:

awk -F'[=&]' '{print $2}' /var/log/nginx/t.access.log | sort | uniq -c | sort -rn

It outputs all columns after 7th which is not required. What am I doing wrong? Any suggestions please.

Advertisement

Answer

Using your 1 sample input line:

$ awk '{split($7,a,/[=&]/); print a[2]}' file | sort | uniq -c | sort -rn
      1 70

or if the rest of your input lines follow EXACTLY the format of that one line:

$ awk -F'[=& ]' '{print $8}' file | sort | uniq -c | sort -rn
      1 70

or entirely in awk:

$ cat tst.awk
{
    split($7,a,/[=&]/)
    sum[a[2]]++
}
END {
    PROCINFO["sorted_in"] = "@val_num_desc"
    for (val in sum) {
        print val "t- " sum[val]
    }
}
$ awk -f tst.awk file
70      - 1

or:

$ cat tst.awk
BEGIN { FS="[=& ]" }
{ sum[$8]++ }
END {
    PROCINFO["sorted_in"] = "@val_num_desc"
    for (val in sum) {
        print val "t- " sum[val]
    }
}
$
$ awk -f tst.awk file
70      - 1

The above uses GNU awk 4.* for PROCINFO["sorted_in"] to sort the output. If you don’t have that, remove that line and pipe to sort -rn with appropriate args. You do not need the intermediate | sort | uniq -c either way.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement