Skip to content
Advertisement

Is it possible to pass regex strings as args to a Python CLI tool using argparse?

I’m writing a script to search a logfile for a given python regex pattern. Setting aside the fact that this would be much easier to do using a simple Bash script, can it be done in Python? Here’s what I’ve run into:

Assumptions:

  • I’m trying to analyze the file /var/log/auth.log
    • (for the sake of simplicity, I’m omitting the ability to choose a file.)
  • the name of my cli module is logscour.
  • for the sake of argument, logscour takes only one arg called regex_in.

Intended usage:

[root@localhost]: # logscour 'd{1,3}.d{1,3}.d{1,3}.d{1,3}'

Should return the lines inside of /var/log/auth.log that contain an IPv4 address.

I want to find a sort of anti-re.escape(), as I am in backslash-hell. Here’s a snippet:

import re
import argparse

def main(regex_in, logfile='/var/log/auth.log'):
    ## Herein lies the problem!
    # user_regex_string = re.escape(regex_in) #<---DOESN'T WORK, EVEN MORE ESCAPE-SLASHES
    # user_regex_string = r'{}'.format(regex_in) #<---DOESN'T WORK
    user_regex_string = regex_in                 #<---DOESN'T WORK EITHER GAHHH
    
    with open(logfile, 'rb+') as authlog:
        for aline in authlog:
            if re.match(user_regex_string, aline):
                print aline

if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument("regex_in", nargs="?", help="enter a python-compliant regex string. Parentheses & matching groups not supported.", default=None)
    
    args = parser.parse_args()
    if not args.regex_in:
        raise argparse.ArgumentError('regex_in', message="you must supply a regex string")
    main(args.regex_in)

This is giving me back nothing, as one would expect due to the fact that I’m using Python2.7 and these are bytestrings I’m dealing with.

Does anyone know a way to convert 'foo' to r'foo', or an “opposite” for re.escape()?

Advertisement

Answer

user_regex_string = re.compile(regex_in)

and

re.search(user_regex_string, aline)

should work fine. You need re.search instead of re.match because the IP address isn’t necessarily at the start of a line.

I always find re.match very convenient in order to introduce subtle bugs in my code. 🙂

On my server, logscour 'd{1,3}.d{1,3}.d{1,3}.d{1,3}' outputs:

May 28 17:38:53 dmzXX sshd[1736]: Received disconnect from 123.200.20.158: 11: Bye Bye [preauth]
May 28 17:38:54 dmzXX sshd[1738]: Invalid user guest from 123.200.20.158
...

That being said grep -P 'pattern' file would also work:

grep -P "d{1,3}.d{1,3}.d{1,3}.d{1,3}" /var/log/auth.log

-P stands for:

   -P, --perl-regexp
          Interpret PATTERN as a Perl regular expression (PCRE, see below).  This is highly experimental and  grep  -P  may  warn  of unimplemented features.

-P is needed in order to interpret d as [0-9]

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement