I have a log file,
10.1.1.10 arcesium.com [17/Dec/2018:08:05:32 +0000] "GET /api/v1/services HTTP/1.1" 200 4081 "http://www. example.com/" "Mozilla/5.0 (X11; Linux x86_64; rv:25.0) Gecko/20100101 Firefox/25.0" 10.1.1.11 arcesium.com [17/Dec/2018:08:05:32 +0000] "GET /api/v1/services HTTP/1.1" 200 4084 "http://www. example.com/" "Mozilla/5.0 (X11; Linux x86_64; rv:25.0) Gecko/20100101 Firefox/25.0" 10.1.1.13 arcesium.com [17/Dec/2018:08:05:32 +0000] "GET /api/v1/services HTTP/1.1" 200 4082 "http://www. example.com/" "Mozilla/5.0 (X11; Linux x86_64; rv:25.0) Gecko/20100101 Firefox/25.0"
I want to get the 9th field as,
awk '{print $9}' file.txt 4081 4084 4082
But the problem is if the 3rd column got one more space "[17/Dec/2018:08:05:32 +0000]"
, then my value position will change to 10th column.
How can I achieve to combine the single value fields irrespective of space between them.
I want to achieve this using awk.
Advertisement
Answer
You can use in gnu-awk FPAT, splitting by content
awk 'BEGIN{FPAT = "("[^"]+")|(\[[^\]]+\])|([^ ]+)" } {print $6}' file.txt
you get,
4081 4084 4082
For column 1,
awk 'BEGIN{FPAT = "("[^"]+")|(\[[^\]]+\])|([^ ]+)" } {print $1}' file.txt
you get,
10.1.1.10 10.1.1.11 10.1.1.13
For column 3, for example
awk 'BEGIN{FPAT = "("[^"]+")|(\[[^\]]+\])|([^ ]+)" } {print $3}' file.txt
you get,
[17/Dec/2018:08:05:32 +0000] [17/Dec/2018:08:05:32 +0000] [17/Dec/2018:08:05:32 +0000]
for column 4, for example
awk 'BEGIN{FPAT = "("[^"]+")|(\[[^\]]+\])|([^ ]+)" } {print $4}' file.txt
you get,
"GET /api/v1/services HTTP/1.1" "GET /api/v1/services HTTP/1.1" "GET /api/v1/services HTTP/1.1"
REGEX Explanation
- 1st Alternative
("[^"]+")
Match record which starts with "
and ends with "
, ex. "GET /api/v1/services HTTP/1.1"
- 2nd Alternative
(\[[^\]]+\])
. Note in awk\[
or\]
is mandatory
Match record which starts with [
and ends with ]
, ex. [17/Dec/2018:08:05:32 +0000]
- 3rd Alternative
([^ ]+)
Match with whole word, ex. 10.1.1.10
or arcesium.com