I need to find both 150+ eventType and errorCodes in 1700 files each day. That means i have to loop over 1700 files to find the occurrence count of 150+ eventType/errorCode and put those counts in a text file as a daily report.
I have placed those eventType/errorCode values in a text file separated by commas:
10008,4569 10008,4568 10003,1200 40000,4006
My initial code:
#!/bin/bash DT=`date +%Y%m%d%H` //Today's date fileName=$(date --date="-1 day" +"%Y%m%d") //file name associated with yesterday's date Yesterday=$(date --date="-1 day" +"%Y-%m-%d") //Yesterday's date cd /advdata/datashareB/FFFF/continuousDownstream/` echo $Yesterday` ### Here I want to loop through text file that contains both errorCodes/eventsType and search them in 1700 files. in the loop i have to execute the following command: ### eventExport -printEvents -file Run_`echo $fileName`*_*.tar -filter "ErrorCode=4569;EventType=10008" -names -silent | wc -l
The output should be written to a text file in the following format:
Date 10008/4569 10008/4568 10003,1200 ... ... 20160621 100 12800 58 ........ .... ..... ... .... ... ...
where the first row is the header and the second row is the total count of errorCodes/eventsType.
Every day the script should insert the values in the new line in the output file (text file).
How can I write this loop?
EDIT:
The file format is tar file like Run_20160622_105700_02of04.tar
. eventExport reads those tar files and extract error codes & eventTypes as given in the eventExport argument. the command is like:
eventExport -printEvents -file Run_20160526_09*_*.tar -filter "**ErrorCode=4569;EventType=10008**" -names -silent | head | awk -F, '{OFS =","; print $3, $8,$9, $14}'
The output of is:
AccessKey="706385970",EventType=10008,OrigEventTime=2016-06-21 23:29:42.000,ErrorCode=4569
Here, eventsType is associated with errorCode. I have more than 150 eventTypes which i want to find them and get their counts in the tar files. tar files are more than 1700 file generated per day.
Advertisement
Answer
Here is a GNU awk
script (as its own script file, for reusability) that parses the event types and error codes the log file and reports the counts of matching event types and error codes for each date.
#!/usr/bin/awk -f /^[0-9]+,[0-9]+$/ { # this line contains event type and error code split($0, data, ","); keys[data[1]][data[2]] = 0; } match($0, "EventType=([0-9]+).*ErrorCode=([0-9]+)", key) { # this line is from the log file if (key[1] in keys && key[2] in keys[key[1]]) { match($0, "OrigEventTime=([0-9-]+)", date); datecount[date[1]][key[1]][key[2]]++; } } END { for (d in datecount) { for (k1 in datecount[d]) { for (k2 in datecount[d][k1]) { printf("%st%s/%st%dn", d, k1, k2, datecount[d][k1][k2]); } } } }
Running it (note thot this requires GNU awk
):
$ awk -f script.awk codes.txt run.log
The output is not quite in the format that you wanted, but I’m hoping it’s close enough:
2016-06-11 10008/4569 1 2016-06-21 10008/4569 4 2016-06-21 40000/4006 1
(I duplicated the data that you gave us a few times and change a date and one of the event types and error codes).
UPDATE: I reworked the script for GNU awk
versions older than 4.0 (that do not understand arrays of arrays):
#!/usr/bin/awk -f /^[0-9]+,[0-9]+$/ { # this line contains event type and error code split($0, data, ","); keys[data[1],data[2]] = 1; } match($0, "EventType=([0-9]+).*ErrorCode=([0-9]+)", key) { # this line is from the log file if (keys[key[1],key[2]] == 1) { match($0, "OrigEventTime=([0-9-]+)", date); count[date[1],key[1],key[2]]++; } } END { for (comb in count) { split(comb, field, SUBSEP); printf("%st%s/%st%sn", field[1], field[2], field[3], count[comb]); } }