I have several apache access files that I would like to clean up a bit before I analyze them. I am trying to use grep in the following way:
grep -v term_to_grep apache_access_log
I have several terms that I want to grep, so I am piping every grep action as follow:
grep -v term_to_grep_1 apache_access_log | grep -v term_to_grep_2 | grep -v term_to_grep_3 | grep -v term_to_grep_n > apache_access_log_cleaned
Until here my rudimentary script works as expected! But I have many apache access logs, and I don’t want to do that for every file. I have started to write a bash script but so far I couldn’t make it work. This is my try:
for logs in ./access_logs/*; do cat $logs | grep -v term_to_grep | grep -v term_to_grep_2 | grep -v term_to_grep_3 | grep -v term_to_grep_n > $logs_clean done;
Could anyone point me out what I am doing wrong?
Advertisement
Answer
If you have a variable and you append _clean
to its name, that’s a new variable, and not the value of the old one with _clean
appended. To fix that, use curly braces:
$ var=file.log $ echo "<$var>" <file.log> $ echo "<$var_clean>" <> $ echo "<${var}_clean>" <file.log_clean>
Without it, your pipeline tries to redirect to the empty string, which results in an error. Note that "$file"_clean
would also work.
As for your pipeline, you could combine that into a single grep command:
grep -Ev 'term_to_grep|term_to_grep_2|term_to_grep_3|term_to_grep_n' "$logs" > "${logs}_clean"
No cat
needed, only a single invocation of grep.
Or you could stick all your terms into a file:
$ cat excludes term_to_grep_1 term_to_grep_2 term_to_grep_3 term_to_grep_n
and then use the -f
option:
grep -vf excludes "$logs" > "${logs}_clean"
If your terms are strings and not regular expressions, you might be able to speed this up by using -F
(“fixed strings”):
grep -vFf excludes "$logs" > "${logs}_clean"
I think GNU grep checks that for you on its own, though.