Skip to content
Advertisement

Bash Scripting – REGEX to dump a file list

I have 4 files extensions as result of previous works, stored in the $SEARCH array, as follows :

declare -a SEARCH=("toggled" "jtr" "jtr.toggled" "cupp")

I want to issue one file list for each of the 4 above extension patterns, as follows, except for the case with 2 dots and 2 extensions (marked “NO”) :

################################################################################
1 - SEARCH FOR toggled in /media
regex   : ([^/]+)(.)(toggled)$
command : find /media -type f | grep --color -P ([^/]+)(.)(toggled)$
################################################################################
/media/myfile_1.jtr.toggled --> NO
/media/myfile_1.toggled
/media/myfile_2.jtr.toggled --> NO
/media/myfile_2.toggled
/media/myfile_3.jtr.toggled --> NO
/media/myfile_3.toggled


################################################################################
2 - SEARCH FOR jtr in /media
regex   : ([^/]+)(.)(jtr)$
command : find /media -type f | grep --color -P ([^/]+)(.)(jtr)$
################################################################################
/media/myfile_1.jtr
/media/myfile_2.jtr
/media/myfile_3.jtr


################################################################################
3 - SEARCH FOR jtr.toggled in /media
regex   : ([^/]+)(.)(jtr.toggled)$
command : find /media -type f | grep --color -P ([^/]+)(.)(jtr.toggled)$
################################################################################
/media/myfile_1.jtr.toggled
/media/myfile_2.jtr.toggled
/media/myfile_3.jtr.toggled


################################################################################
4 - SEARCH FOR cupp in /media
regex   : ([^/]+)(.)(cupp)$
command : find /media -type f | grep --color -P ([^/]+)(.)(cupp)$
################################################################################
/media/myfile_1.cupp
/media/myfile_2.cupp
/media/myfile_3.cupp

Obviously I spent hours on regex101 w/o success. I also tried to achieve my target with other methods, which does not fit with the rest of the code.

Here is a code extract :

for ext in "${SEARCH[@]}"
do

    COUNTi=$((COUNTi+1))

    REGEX="([^/]+)(.)("$ext")$" #
    # Ideally, the Regex should come from a pattern array

    printf '%*s' "$len" | tr ' ' "$mychar"
    echo -e "n$COUNTi - SEARCH FOR $ext in $BASEDIR"
    echo "regex   : $REGEX"
    echo "command : find $BASEDIR -type f | grep --color -P $REGEX"
    printf '%*s' "$len" | tr ' ' "$mychar" && echo

    find $BASEDIR -type f | grep --color -P $REGEX 
    # the Regex caveats as the double dot extensions are not parsed correctly.

    echo -e "n"

done

So my 2 questions related to the same piece of code :

  1. REGEX : what would be a correct regex, to be able to parse and dump the files by extension family (pls see the 4 SEARCH patterns and related dumps) ?

  2. ARRAYS : Once above point is solved, how to use a pattern array data, containing the $extension placeholder, into the looped REGEX ?

     PATTERN+=( "([^/]+)(.)($ext)$" )
    # All of these below : CAVEATS escaping $ or not...
    # REGEX=${PATTERN[5]}
    # REGEX=$(eval "${PATTERN[5]}" )
    # echo "pattern : ${PATTERN[5]}"
    # eval "$REGEX=$REGEX"
    # eval "$REGEX="$REGEX""
    # REGEX=$(echo "${REGEX}")
    # REGEX=${!PATTERN[5]}
    

Notes: I read all regex documentations for hours, tried hundreds of regex patterns, w/o success as I can’t understand these regex rationales.
I also tried other ways, for example find / -name "sayONEnameinmysearchpattern" ! -iname "theothernamesfromtehsearchpattern". This is not what I’m looking for.

Thx

Advertisement

Answer

Change the REGEX line in your code to:

REGEX='^(.*/|)[^/.]+.'"$ext$"

The perl regular expression to match the basename of the file is in single quotes. This prevents the shell from trying to expand it. The $ext is in double quotes, so it will be expanded by the shell. The trailing $ is escaped with a backslash just for form.

The leading ^(.*/|) will match a leading directory (ending with /), the [^/.]+ will match one or more characters that are NOT ‘.’ or ‘/’. That must then be followed by a ‘.’ and your extension, followed by the end of the file name ($) to match.

The key here is to anchor your match at both ends (^ and $) and not allow any dots ‘.’ except the ones you really want.

You also might want to put $REGEX in quotes… “$REGEX” in the grep command near the end of your code extract.

Advertisement