I apologize beforehand for the jumbled mess that is the title but that’s the shortest way I could think to describe what I’m trying to do.
I’m reading a file that has multiple lines of text that I’m looping through and I’m trying to use regex to get a substring from each line. These lines will start with the word “name: ” then have some series of letters and possibly hyphens. After that, there may be a ‘#’ followed by digits, or a ‘-‘ followed by digits, or a new line. I only want to capture the letters and possible hyphens. Below is what I’ve tried with input, output, and intended output. This regex is being run in linux bash script
regex |
---|
name: (.[^#rnd]*) |
input |
---|
name: foo-bar#2.3.2 |
name: bar-foo-4.2 |
name: foobar |
name: far-far |
captured outputs |
---|
foo-bar |
bar-foo- |
foobar |
far-far |
Intended outputs |
---|
foo-bar |
bar-foo |
foobar |
far-far |
Code sample:
fileRegex="name: (.[^\#rnd]*)" for i in "${fileList[@]}" do if [[$i =~ $fileRegex ]]; then fixedLine="${BASH_REMATCH[1]} echo "$fixedLine" fi done
From the table, the offending instance is “name: bar-foo-4.2” which should only output “bar-foo” but instead is outputting “bar-foo-“. What I’m trying to figure out is how to stop capturing when there is a “-” followed by any digits, but also to maintain the outputs of all the other examples.
Advertisement
Answer
In bash
you may try this code:
declare -a arr=([0]="name: foo-bar#2.3.2" [1]="name: bar-foo-4.2" [2]="name: foobar" [3]="name: far-far") fileRegex='name: ([[:alpha:]]+(-[[:alpha:]]+)*)' for s in "${arr[@]}"; do [[ $s =~ $fileRegex ]] && echo "${BASH_REMATCH[1]}" done
Output:
foo-bar bar-foo foobar far-far
RegEx Explained:
name:
: Match"name: "
(
: First capture group start[[:alpha:]]+
: Match 1+ alphabets- (-[[:alpha:]]+)*`: Separated with hyphens match 0 or more 1+ alpha character substring
)
: First capture group end