Skip to content
Advertisement

Bash script, find command, using wildcards or regex

I am writing a bash script that goes over all files in certain directory and:

  1. Picks the files with names that match a specified pattern
  2. Sorts them by data and time (date and time are part of the filename)
  3. Takes X oldest files
  4. Performs certain operations on them

The pattern used to match the files is passed to the script and looks like:

someprefix_[cats|dogs]_[oranges|apples|tomatos]_[2|3]*.txt

I tried to implement it as following (fields 6 and 7 at the pattern are assumed to contain date and time):

FILES=`find . -name "$PATTERN” | sort -t_ -k6 | head -n $NUM_OF_FILES`

It doesn’t work. Tried various options with -name and -regex…. Most examples online are for much less complicated patterns. Since there might be hundreds of thousands of files to go through, I am looking for a solution that works efficiently. I would like to avoid using sed for readability reasons.

Advertisement

Answer

Your find regex must match the entire path returned by find. For example if you are searching somedir/ for your files, then your regex must match, e.g.

somedir/prefix_cats_apples_2.txt

Complicating the picture, is you have multiple types of regex you can use by changing the -regextype option to find, e.g. emacs (default), posix-awk, posix-basic, posix-egrep, posix-extended. (posix-basic has no alteration capability)

posix-egrep is probably the most transferable between your tools like grep, sed, find, etc.. A posix-egrep regex for your pattern searching for the files in somedir/ would be:

'somedir/prefix_(cats|dogs)_(apples|oranges|tomatos).*[23].*$'

Matching against a test with your filenames (with the ending number ranging 0-3 to show the exclusion of files ending in 0, 1) the following example files were used:

$ls -1 somedir/
prefix_cats_apples_0.txt
prefix_cats_apples_1.txt
prefix_cats_apples_2.txt
prefix_cats_apples_3.txt
prefix_cats_oranges_0.txt
prefix_cats_oranges_1.txt
prefix_cats_oranges_2.txt
prefix_cats_oranges_3.txt
prefix_cats_tomatos_0.txt
prefix_cats_tomatos_1.txt
prefix_cats_tomatos_2.txt
prefix_cats_tomatos_3.txt
prefix_dogs_apples_0.txt
prefix_dogs_apples_1.txt
prefix_dogs_apples_2.txt
prefix_dogs_apples_3.txt
prefix_dogs_oranges_0.txt
prefix_dogs_oranges_1.txt
prefix_dogs_oranges_2.txt
prefix_dogs_oranges_3.txt
prefix_dogs_tomatos_0.txt
prefix_dogs_tomatos_1.txt
prefix_dogs_tomatos_2.txt
prefix_dogs_tomatos_3.txt

Now matching only files that satisfy your criteria and passing for a general sort would yield:

$ find somedir/ -regextype posix-egrep -regex 'somedir/prefix_(cats|dogs)_(apples|oranges|tomatos).*[23].*$' | sort
somedir/prefix_cats_apples_2.txt
somedir/prefix_cats_apples_3.txt
somedir/prefix_cats_oranges_2.txt
somedir/prefix_cats_oranges_3.txt
somedir/prefix_cats_tomatos_2.txt
somedir/prefix_cats_tomatos_3.txt
somedir/prefix_dogs_apples_2.txt
somedir/prefix_dogs_apples_3.txt
somedir/prefix_dogs_oranges_2.txt
somedir/prefix_dogs_oranges_3.txt
somedir/prefix_dogs_tomatos_2.txt
somedir/prefix_dogs_tomatos_3.txt

Since you didn’t provide an example of where the time/date was in the filenames, the sorting by time/date is left to you. Let me know if you have further questions.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement