I am writing a bash script that goes over all files in certain directory and:
- Picks the files with names that match a specified pattern
- Sorts them by data and time (date and time are part of the filename)
- Takes X oldest files
- Performs certain operations on them
The pattern used to match the files is passed to the script and looks like:
someprefix_[cats|dogs]_[oranges|apples|tomatos]_[2|3]*.txt
I tried to implement it as following (fields 6 and 7 at the pattern are assumed to contain date and time):
FILES=`find . -name "$PATTERN” | sort -t_ -k6 | head -n $NUM_OF_FILES`
It doesn’t work.
Tried various options with -name
and -regex
….
Most examples online are for much less complicated patterns.
Since there might be hundreds of thousands of files to go through, I am looking for a solution that works efficiently.
I would like to avoid using sed for readability reasons.
Advertisement
Answer
Your find
regex must match the entire path returned by find. For example if you are searching somedir/
for your files, then your regex must match, e.g.
somedir/prefix_cats_apples_2.txt
Complicating the picture, is you have multiple types of regex you can use by changing the -regextype
option to find
, e.g. emacs (default), posix-awk, posix-basic, posix-egrep, posix-extended
. (posix-basic
has no alteration capability)
posix-egrep
is probably the most transferable between your tools like grep, sed, find, etc..
A posix-egrep
regex for your pattern searching for the files in somedir/
would be:
'somedir/prefix_(cats|dogs)_(apples|oranges|tomatos).*[23].*$'
Matching against a test with your filenames (with the ending number ranging 0-3
to show the exclusion of files ending in 0, 1
) the following example files were used:
$ls -1 somedir/ prefix_cats_apples_0.txt prefix_cats_apples_1.txt prefix_cats_apples_2.txt prefix_cats_apples_3.txt prefix_cats_oranges_0.txt prefix_cats_oranges_1.txt prefix_cats_oranges_2.txt prefix_cats_oranges_3.txt prefix_cats_tomatos_0.txt prefix_cats_tomatos_1.txt prefix_cats_tomatos_2.txt prefix_cats_tomatos_3.txt prefix_dogs_apples_0.txt prefix_dogs_apples_1.txt prefix_dogs_apples_2.txt prefix_dogs_apples_3.txt prefix_dogs_oranges_0.txt prefix_dogs_oranges_1.txt prefix_dogs_oranges_2.txt prefix_dogs_oranges_3.txt prefix_dogs_tomatos_0.txt prefix_dogs_tomatos_1.txt prefix_dogs_tomatos_2.txt prefix_dogs_tomatos_3.txt
Now matching only files that satisfy your criteria and passing for a general sort
would yield:
$ find somedir/ -regextype posix-egrep -regex 'somedir/prefix_(cats|dogs)_(apples|oranges|tomatos).*[23].*$' | sort somedir/prefix_cats_apples_2.txt somedir/prefix_cats_apples_3.txt somedir/prefix_cats_oranges_2.txt somedir/prefix_cats_oranges_3.txt somedir/prefix_cats_tomatos_2.txt somedir/prefix_cats_tomatos_3.txt somedir/prefix_dogs_apples_2.txt somedir/prefix_dogs_apples_3.txt somedir/prefix_dogs_oranges_2.txt somedir/prefix_dogs_oranges_3.txt somedir/prefix_dogs_tomatos_2.txt somedir/prefix_dogs_tomatos_3.txt
Since you didn’t provide an example of where the time/date was in the filenames, the sorting by time/date is left to you. Let me know if you have further questions.