Skip to content
Advertisement

Sed – How to Print Regex Groups in Multi-Line?

Input file (test):

123456<a id="id1" name="name1" href="link1">This is link1</a>789<a id="id2"
href="link2">This is link2</a>0123

Desired output:

link1
link2

What I have done:

$ sed -e '/<a/{:begin;/</a>/!{N;b begin};s/<a([^<]*)</a>/QQ/;/<a/b begin}' test
123456QQ789QQ0123

Question: How do you print the regex groups in sed (multiline)?

Advertisement

Answer

If you use sed like this:

sed -e '/<a/{:begin;/</a>/!{N;b begin};s/<a([^<]*)</a>/n/;/<a/b begin}'

then it will print in different lines:

123456
789
0123

But is this what you are trying to print? Or you want to print text in hrefs?

Update 1: To get hrefs between well formed <a and </a>

sed -r '$!N; s~n~~; s~(<a )~n1~ig; s~[^<]*<a[^>]*hrefs*=s*"([^"]*)"[^n]*~1n~ig' test

output

link1
link2

Update 2: Getting above output using bash regex feature

regex='href="([^"]*)"'
while read line; do
   [[ $line =~ $regex ]] || continue
   echo ${BASH_REMATCH[1]}
done < test

output

link1
link2
User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement