Input file (test):
123456<a id="id1" name="name1" href="link1">This is link1</a>789<a id="id2" href="link2">This is link2</a>0123
Desired output:
link1 link2
What I have done:
$ sed -e '/<a/{:begin;/</a>/!{N;b begin};s/<a([^<]*)</a>/QQ/;/<a/b begin}' test 123456QQ789QQ0123
Question: How do you print the regex groups in sed (multiline)?
Advertisement
Answer
If you use sed like this:
sed -e '/<a/{:begin;/</a>/!{N;b begin};s/<a([^<]*)</a>/n/;/<a/b begin}'
then it will print in different lines:
123456 789 0123
But is this what you are trying to print? Or you want to print text in hrefs?
Update 1: To get hrefs between well formed <a
and </a>
sed -r '$!N; s~n~~; s~(<a )~n1~ig; s~[^<]*<a[^>]*hrefs*=s*"([^"]*)"[^n]*~1n~ig' test
output
link1 link2
Update 2: Getting above output using bash regex feature
regex='href="([^"]*)"' while read line; do [[ $line =~ $regex ]] || continue echo ${BASH_REMATCH[1]} done < test
output
link1 link2