Skip to content
Advertisement

Using sed – how to replace two HTML tags or patterns with unknown content in-between?

I want to leave the unknown content between tags intact, but want to match all tags that use:

<div class="section1-title">arbitrary content here</div>

and replace the surrounding tags with:

<h2>arbitrary content here</h2>

I’ve come up with the following, but obviously it’s not working as in the second part it’s literally substituting “].*[<]/h2[>]” for each match found.

sed -i 's/[<]div class="section1-title"[>].*[<]/div[>]/<h2[>].*[<]/h2[>]/g'

I’d like to specifically know how to leave that middle content intact, no matter what is in there, and just match those surrounding tags as obviously there are quite a few elements with so I can’t just separately search & replace them. The first part of the sed statement does seem to match the right content as far as I can tell, it’s mostly part 2 that I’m unsure of.

Advertisement

Answer

What you need is a backref.

    bash-3.2$ sed 's/<div class="section1-title">(.*) 
    </div>/<h2>1</h2>/g' <<< '<div class="section1-title">arbitrary 
    content here</div>'
    <h2>arbitrary content here</h2>

The parentheses around your content – (.*) – allow it to be referenced later as is with the 1.

See: https://www.regular-expressions.info/backref.html

and .bash_profile sed: 1 not defined in the RE for an explanation about why the parentheses should be escaped in your regex.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement