I want to leave the unknown content between tags intact, but want to match all tags that use:
<div class="section1-title">arbitrary content here</div>
and replace the surrounding tags with:
<h2>arbitrary content here</h2>
I’ve come up with the following, but obviously it’s not working as in the second part it’s literally substituting “].*[<]/h2[>]” for each match found.
sed -i 's/[<]div class="section1-title"[>].*[<]/div[>]/<h2[>].*[<]/h2[>]/g'
I’d like to specifically know how to leave that middle content intact, no matter what is in there, and just match those surrounding tags as obviously there are quite a few elements with so I can’t just separately search & replace them. The first part of the sed statement does seem to match the right content as far as I can tell, it’s mostly part 2 that I’m unsure of.
Advertisement
Answer
What you need is a backref.
bash-3.2$ sed 's/<div class="section1-title">(.*) </div>/<h2>1</h2>/g' <<< '<div class="section1-title">arbitrary content here</div>' <h2>arbitrary content here</h2>
The parentheses around your content – (.*)
– allow it to be referenced later as is with the 1
.
See: https://www.regular-expressions.info/backref.html
and .bash_profile sed: 1 not defined in the RE for an explanation about why the parentheses should be escaped in your regex.