I’m writing a script to search a logfile for a given python regex pattern. Setting aside the fact that this would be much easier to do using a simple Bash script, can it be done in Python? Here’s what I’ve run into:
Assumptions:
- I’m trying to analyze the file
/var/log/auth.log
- (for the sake of simplicity, I’m omitting the ability to choose a file.)
- the name of my cli module is
logscour
. - for the sake of argument,
logscour
takes only one arg calledregex_in
.
Intended usage:
[root@localhost]: # logscour 'd{1,3}.d{1,3}.d{1,3}.d{1,3}'
Should return the lines inside of /var/log/auth.log
that contain an IPv4 address.
I want to find a sort of anti-re.escape()
, as I am in backslash-hell. Here’s a snippet:
import re import argparse def main(regex_in, logfile='/var/log/auth.log'): ## Herein lies the problem! # user_regex_string = re.escape(regex_in) #<---DOESN'T WORK, EVEN MORE ESCAPE-SLASHES # user_regex_string = r'{}'.format(regex_in) #<---DOESN'T WORK user_regex_string = regex_in #<---DOESN'T WORK EITHER GAHHH with open(logfile, 'rb+') as authlog: for aline in authlog: if re.match(user_regex_string, aline): print aline if __name__ == '__main__': parser = argparse.ArgumentParser() parser.add_argument("regex_in", nargs="?", help="enter a python-compliant regex string. Parentheses & matching groups not supported.", default=None) args = parser.parse_args() if not args.regex_in: raise argparse.ArgumentError('regex_in', message="you must supply a regex string") main(args.regex_in)
This is giving me back nothing, as one would expect due to the fact that I’m using Python2.7 and these are bytestrings I’m dealing with.
Does anyone know a way to convert 'foo'
to r'foo'
, or an “opposite” for re.escape()
?
Advertisement
Answer
user_regex_string = re.compile(regex_in)
and
re.search(user_regex_string, aline)
should work fine. You need re.search
instead of re.match
because the IP address isn’t necessarily at the start of a line.
I always find re.match
very convenient in order to introduce subtle bugs in my code. 🙂
On my server, logscour 'd{1,3}.d{1,3}.d{1,3}.d{1,3}'
outputs:
May 28 17:38:53 dmzXX sshd[1736]: Received disconnect from 123.200.20.158: 11: Bye Bye [preauth] May 28 17:38:54 dmzXX sshd[1738]: Invalid user guest from 123.200.20.158 ...
That being said grep -P 'pattern' file
would also work:
grep -P "d{1,3}.d{1,3}.d{1,3}.d{1,3}" /var/log/auth.log
-P
stands for:
-P, --perl-regexp Interpret PATTERN as a Perl regular expression (PCRE, see below). This is highly experimental and grep -P may warn of unimplemented features.
-P
is needed in order to interpret d
as [0-9]