Skip to content
Advertisement

Python how to pass in an optional filename parameter when running a script to process a file

I have a python script that processes an XML file each day (it is transferred via SFTP to a remote directory, then temporarily copied to a local directory) and stores its information in a MySQL database.

One of my parameters for the file is set to “date=today” so that the correct file is processed each day. This works fine and each day I successfully store new file information into the database.

What I need help on is passing a Linux command line argument to run a file for a specific day (in case a previous day’s file needs to be rerun). I can manually edit my code to make this work but this will not be an option once the project is in production.

In addition, I need to be able to pass in a command line argument for “date=*” and have the script run every file in my remote directory. Currently, this parameter will successfully process only a single file based on alphabetic priority.

If my two questions should be asked separately, my mistake, and I’ll edit this question to just cover one of them. Example of my code below:

today = datetime.datetime.now().strftime('%Y%m%d')

    file_var = local_file_path + connect_to_sftp.sftp_get_file(
                                               local_file_path=local_file_path,
                                               sftp_host=sftp_host,
                                               sftp_username=sftp_username,
                                               sftp_directory=sftp_directory,
                                               date=today)

    ET = xml.etree.ElementTree.parse(file_var).getroot()

    def parse_file():
        for node in ET.findall(.......)

In another module:

    def sftp_get_file(local_file_path, sftp_host, sftp_username, sftp_directory, date):

        pysftp.Connection(sftp_host, sftp_username)

        # find file in remote directory with given suffix
        remote_file = glob.glob(sftp_directory + '/' + date + '_file_suffix.xml')

        # strip directory name from full file name
        file_name_only = remote_file[0][len(sftp_directory):]

        # set local path to hold new file
        local_path = local_file_path

        # combine local path with filename that was loaded
        local_file = local_path + file_name_only

        # pull file from remote directory and send to local directory
        shutil.copyfile(remote_file[0], local_file)

        return file_name_only

So the SFTP module reads the file, transfers it to the local directory, and returns the file name to be used in the parsing module. The parsing module passes in the parameters and does the rest of the work.

What I need to be able to do, on certain occasions, is override the parameter that says “date=today” and instead say “date=20151225”, for example, but I must do this through a Linux command line argument.

In addition, if I currently enter the parameter of “date=*” it only runs the script for the first file that matches that parameter. I need the script to run for ALL files that match that parameter. Any help is much appreciated. Happy to answer any questions to improve clarity.

Advertisement

Answer

You can use sys module and pass the filename as command line argument.

That would be :

import sys

today = str(sys.argv[1]) if len(sys.argv) > 1 else datetime.datetime.now().strftime('%Y%m%d')

If the name is given as first argument, then today variable will be filename given from command line otherwise if no argument is given it will be what you specified as datetime.

For second question,

   file_name_only = remote_file[0][len(sftp_directory):]

You are only accessing the first element, but glob might return serveral files when you use * wildcard. You must iterate over remote_file variable and copy all of them.

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement