I’m writing a script to search a logfile for a given python regex pattern. Setting aside the fact that this would be much easier to do using a simple Bash script, can it be done in Python? Here’s what I’ve run into:
Assumptions:
- I’m trying to analyze the file
/var/log/auth.log- (for the sake of simplicity, I’m omitting the ability to choose a file.)
- the name of my cli module is
logscour. - for the sake of argument,
logscourtakes only one arg calledregex_in.
Intended usage:
[root@localhost]: # logscour 'd{1,3}.d{1,3}.d{1,3}.d{1,3}'
Should return the lines inside of /var/log/auth.log that contain an IPv4 address.
I want to find a sort of anti-re.escape(), as I am in backslash-hell. Here’s a snippet:
import re
import argparse
def main(regex_in, logfile='/var/log/auth.log'):
## Herein lies the problem!
# user_regex_string = re.escape(regex_in) #<---DOESN'T WORK, EVEN MORE ESCAPE-SLASHES
# user_regex_string = r'{}'.format(regex_in) #<---DOESN'T WORK
user_regex_string = regex_in #<---DOESN'T WORK EITHER GAHHH
with open(logfile, 'rb+') as authlog:
for aline in authlog:
if re.match(user_regex_string, aline):
print aline
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument("regex_in", nargs="?", help="enter a python-compliant regex string. Parentheses & matching groups not supported.", default=None)
args = parser.parse_args()
if not args.regex_in:
raise argparse.ArgumentError('regex_in', message="you must supply a regex string")
main(args.regex_in)
This is giving me back nothing, as one would expect due to the fact that I’m using Python2.7 and these are bytestrings I’m dealing with.
Does anyone know a way to convert 'foo' to r'foo', or an “opposite” for re.escape()?
Advertisement
Answer
user_regex_string = re.compile(regex_in)
and
re.search(user_regex_string, aline)
should work fine. You need re.search instead of re.match because the IP address isn’t necessarily at the start of a line.
I always find re.match very convenient in order to introduce subtle bugs in my code. 🙂
On my server, logscour 'd{1,3}.d{1,3}.d{1,3}.d{1,3}' outputs:
May 28 17:38:53 dmzXX sshd[1736]: Received disconnect from 123.200.20.158: 11: Bye Bye [preauth] May 28 17:38:54 dmzXX sshd[1738]: Invalid user guest from 123.200.20.158 ...
That being said grep -P 'pattern' file would also work:
grep -P "d{1,3}.d{1,3}.d{1,3}.d{1,3}" /var/log/auth.log
-P stands for:
-P, --perl-regexp Interpret PATTERN as a Perl regular expression (PCRE, see below). This is highly experimental and grep -P may warn of unimplemented features.
-P is needed in order to interpret d as [0-9]