I have a log file with example row:
xxx.xxx.xxx.xxx - - [07/Jun/2015:14:18:39 +0000] "GET /file/?t=70 HTTP/1.1" 200 35 "http://1234.com/p/talk-about-owning-it/" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome"
The 7th column of each row looks like this:
/file/?t=70 /file/?t=4785&k=1 /file/?t=120 /file/?t=95&k=0 /file/?t=120 /file/?t=120&k=0 /file/?t=95&k=1 ...
Output is arranged according to the number of lines in decreasing order containing unique values of t.
Desired OUTPUT:
120 - 3 95 - 2 4785 - 1 70 - 1 ...
I am using awk but it is not giving desired output:
awk -F'[=&]' '{print $2}' /var/log/nginx/t.access.log | sort | uniq -c | sort -rn
It outputs all columns after 7th which is not required. What am I doing wrong? Any suggestions please.
Advertisement
Answer
Using your 1 sample input line:
$ awk '{split($7,a,/[=&]/); print a[2]}' file | sort | uniq -c | sort -rn 1 70
or if the rest of your input lines follow EXACTLY the format of that one line:
$ awk -F'[=& ]' '{print $8}' file | sort | uniq -c | sort -rn 1 70
or entirely in awk:
$ cat tst.awk { split($7,a,/[=&]/) sum[a[2]]++ } END { PROCINFO["sorted_in"] = "@val_num_desc" for (val in sum) { print val "t- " sum[val] } } $ awk -f tst.awk file 70 - 1
or:
$ cat tst.awk BEGIN { FS="[=& ]" } { sum[$8]++ } END { PROCINFO["sorted_in"] = "@val_num_desc" for (val in sum) { print val "t- " sum[val] } } $ $ awk -f tst.awk file 70 - 1
The above uses GNU awk 4.* for PROCINFO["sorted_in"]
to sort the output. If you don’t have that, remove that line and pipe to sort -rn
with appropriate args. You do not need the intermediate | sort | uniq -c
either way.