Skip to content
Advertisement

How to use Grep commands to find specific value in text file

I need to grep a file called daily_fails_count.csv but only find the number of failures. Inside that file is this, on a shorter scale:

January,1,0,0
January,1,1,0
January,1,2,0
January,1,3,0
January,1,4,0
January,1,5,0
January,1,6,0
January,1,7,0
January,1,8,0

It’s format is “month,day,hours,failures.” It goes through all months. The last value is the number of failures found at that time. I know here it all says 0 but that’s because no failures were found there, other dates have failures.

I’m not very good with grep commands in Linux scripts, so my question is this, how do I grep to find just the last digit in the file?

I’m writing this script in a file called make_accum_fail_counts.sh and I will run it as such:

bash make_accum_fail_counts.sh daily_fail_counts.csv > accum_fail_counts.csv

So I’m using the daily_fail_counts.csv as the input for the new script. Here’s my script so far:

#!/bin/bash

if [ $# == 1 ]
then
    logFile=$1
fi

cat $logFile > tmpFile

hour=0
failure=0

while [ $hour -le 23 ]
do
    if [ $hour -le 23 ]
    then
        failure=`grep "*,*,*,^[0-10]" tmpFile | wc -l`
    fi
    echo "$hour,$failure"
    hour=$((hour+1))
    failure=0
done
rm -rf tmpFile

I just need help with my grep command:

failure=`grep "*,*,*,^[0-10]" tmpFile | wc -l`

Just to find, among all the days, the failures from hour to hour. so it’s output would be:

0,1000
1,1040
2,2888

Where there were 1000 failures between 0:00-1:00, 1040 failures between 1:00-2:00 and so on. Thanks in advance.

Advertisement

Answer

cat yourfile.csv | cut -d',' -f 4 | paste -s -d+ - | bc

To sum all the failures. Use cut -d',' -f 4 yourfile.csv to split each line on the commas and get the 4th value, that’ll give you a list of numbers, then use a shell command to sum a list of numbers.

You can grep to filter it down to the hour, something like

cat yourfile.csv | cut -d',' -f 3,4 | grep ^0, | cut -d',' -f 2

To get all the 0th hour failure counts.

for hour in {0..23}; do
    cat yourfile.csv | cut -d',' -f 3,4 | grep ^$hour, | cut -d',' -f 2 | paste -s -d+ - | bc
done

To get the totals for each hour.

If you want them grouped by day you can read about the date command, figure out how to get it to output strings like January,1, and and add an outer for loop to the above command that passes each line through a grep with the output of that date command.

Personally, at this point I would start writing Python instead of bash. The pandas library is better suited for this.

Advertisement