Input file (test):
JavaScript
x
123456<a id="id1" name="name1" href="link1">This is link1</a>789<a id="id2"
href="link2">This is link2</a>0123
Desired output:
JavaScript
link1
link2
What I have done:
JavaScript
$ sed -e '/<a/{:begin;/</a>/!{N;b begin};s/<a([^<]*)</a>/QQ/;/<a/b begin}' test
123456QQ789QQ0123
Question: How do you print the regex groups in sed (multiline)?
Advertisement
Answer
If you use sed like this:
JavaScript
sed -e '/<a/{:begin;/</a>/!{N;b begin};s/<a([^<]*)</a>/n/;/<a/b begin}'
then it will print in different lines:
JavaScript
123456
789
0123
But is this what you are trying to print? Or you want to print text in hrefs?
Update 1: To get hrefs between well formed <a
and </a>
JavaScript
sed -r '$!N; s~n~~; s~(<a )~n1~ig; s~[^<]*<a[^>]*hrefs*=s*"([^"]*)"[^n]*~1n~ig' test
output
JavaScript
link1
link2
Update 2: Getting above output using bash regex feature
JavaScript
regex='href="([^"]*)"'
while read line; do
[[ $line =~ $regex ]] || continue
echo ${BASH_REMATCH[1]}
done < test
output
JavaScript
link1
link2