I need to grep a file called daily_fails_count.csv but only find the number of failures. Inside that file is this, on a shorter scale:
January,1,0,0 January,1,1,0 January,1,2,0 January,1,3,0 January,1,4,0 January,1,5,0 January,1,6,0 January,1,7,0 January,1,8,0
It’s format is “month,day,hours,failures.” It goes through all months. The last value is the number of failures found at that time. I know here it all says 0 but that’s because no failures were found there, other dates have failures.
I’m not very good with grep commands in Linux scripts, so my question is this, how do I grep to find just the last digit in the file?
I’m writing this script in a file called make_accum_fail_counts.sh and I will run it as such:
bash make_accum_fail_counts.sh daily_fail_counts.csv > accum_fail_counts.csv
So I’m using the daily_fail_counts.csv as the input for the new script. Here’s my script so far:
#!/bin/bash if [ $# == 1 ] then logFile=$1 fi cat $logFile > tmpFile hour=0 failure=0 while [ $hour -le 23 ] do if [ $hour -le 23 ] then failure=`grep "*,*,*,^[0-10]" tmpFile | wc -l` fi echo "$hour,$failure" hour=$((hour+1)) failure=0 done rm -rf tmpFile
I just need help with my grep command:
failure=`grep "*,*,*,^[0-10]" tmpFile | wc -l`
Just to find, among all the days, the failures from hour to hour. so it’s output would be:
0,1000 1,1040 2,2888
Where there were 1000 failures between 0:00-1:00, 1040 failures between 1:00-2:00 and so on. Thanks in advance.
Advertisement
Answer
cat yourfile.csv | cut -d',' -f 4 | paste -s -d+ - | bc
To sum all the failures. Use cut -d',' -f 4 yourfile.csv
to split each line on the commas and get the 4th value, that’ll give you a list of numbers, then use a shell command to sum a list of numbers.
You can grep to filter it down to the hour, something like
cat yourfile.csv | cut -d',' -f 3,4 | grep ^0, | cut -d',' -f 2
To get all the 0th hour failure counts.
for hour in {0..23}; do cat yourfile.csv | cut -d',' -f 3,4 | grep ^$hour, | cut -d',' -f 2 | paste -s -d+ - | bc done
To get the totals for each hour.
If you want them grouped by day you can read about the date
command, figure out how to get it to output strings like January,1,
and and add an outer for
loop to the above command that passes each line through a grep
with the output of that date
command.
Personally, at this point I would start writing Python instead of bash. The pandas
library is better suited for this.