Skip to content
Advertisement

bash – print regex captured groups

I have a file.xml so composed:

...some xml text here...
    <Version>1.0.13-alpha</Version>
...some xml text here...

I need to extract the following information:

  • mayor_and_minor_release_number –> 1.0
  • patch_number –> 13
  • suffix –> -alpha

I’ve thought the cleanest way to achieve that is by mean of a regex with grep command:

<Version>(d+.d+).(d+)([w-]+)?</Version>

I’ve checked with regex101 the correctness of this regex and actually it seems to properly capture the 3 fields I’m looking for. But here comes the problem, since I have no idea how to print those fields.

cat file.xml | grep "<Version>(d+.d+).(d+)([w-]+)?</Version>" -oP

This command prints the entire line so it’s quite useless.

Several posts on this site have been written about this topic, so I’ve also tried to use the bash native regex support, with poor results:

regex="<Version>(d+.d+).(d+)([w-]+)?</Version>"
txt=$(cat file.xml)
[[ "$txt" =~ $regex ]]     --> it fails!
echo "${BASH_REMATCH[*]}"

I’m sorry but I cannot figure out how to overtake this issue. The desired output should be:

1.0
13
-alpha

Advertisement

Answer

You may use this read + sed solution with similar regex as your’s:

read -r major minor suffix < <(
sed -nE 's~.*<Version>([0-9]+.[0-9]+).([0-9]+)(-[^<]*)</Version>.*~1 2 3~p' file.xml
)

Check variable contents:

declare -p major minor suffix

declare -- major="1.0"
declare -- minor="13"
declare -- suffix="-alpha"

Few points:

  • You cannot use d without using -P (perl) mode in grep
  • grep command doesn’t return capture groups
User contributions licensed under: CC BY-SA
6 People found this is helpful
Advertisement