Skip to content
Advertisement

Find and copy specific files by date

I’ve been trying to get a script working to backup some files from one machine to another but have been running into an issue.

Basically what I want to do is copy two files, one .log and one (or more) .dmp. Their format is always as follows:

something_2022_01_24.log something_2022_01_24.dmp

I want to do three things with these files:

  • find the second to last one .log file (i.e. something_2022_01_24.log is the latest,I want to find the one before that say something_2022_01_22.log)
  • get a substring with just the date (2022_01_22)
  • copy every .dmp that matches the date (i.e something_2022_01_24.dmp, something01_2022_01_24.dmp)

For the first one from what I could find the best way is to do: ls -t *.log | head-2 as it displays the second to last file created.

As for the second one I’m more at a loss because I’m not sure how to parse the output of the first command.

The third one I think I could manage with something of the sort:

[ -f "/var/www/my_folder/*$capturedate.dmp" ] && cp "/var/www/my_folder/*$capturedate.dmp" /tmp/

What do you guys think is there any way to do this? How can I compare the substring?

Thanks!

Advertisement

Answer

Would you please try the following:

#!/bin/bash

dir="/var/www/my_folder"

second=$(ls -t "$dir/"*.log | head -n 2 | tail -n 1)
if [[ $second =~ .*_([0-9]{4}_[0-9]{2}_[0-9]{2}).log ]]; then
    capturedate=${BASH_REMATCH[1]}
    cp -p "$dir/"*"$capturedate".dmp /tmp
fi
  • second=$(ls -t "$dir"/*.log | head -n 2 | tail -n 1) will pick the second to last log file. Please note it assumes that the timestamp of the file is not modified since it is created and the filename does not contain special characters such as a newline. This is an easy solution and we may need more improvement for the robustness.
  • The regex .*_([0-9]{4}_[0-9]{2}_[0-9]{2}).log will match the log filename. It extracts the date substring (enclosed with the parentheses) and assigns the bash variable ${BASH_REMATCH[1]} to it.
  • Then the next cp command will do the job. Please be cateful not to include the widlcard * within the double quotes so that the wildcard is properly expanded.

FYI here are some alternatives to extract the date string.

With sed:

capturedate=$(sed -E 's/.*_([0-9]{4}_[0-9]{2}_[0-9]{2}).log/1/' <<< "$second")

With parameter expansion of bash (if something does not include underscores):

capturedate=${second%.log}
capturedate=${capturedate#*_}

With cut command (if something does not include underscores):

capturedate=$(cut -d_ -f2,3,4 <<< "${second%.log}")
User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement