Skip to content
Advertisement

C strip html between

How can i strip the HTML from document between and including the <…> tags in a HTML document using C? My current program uses curl to get the contents of the webpage and puts it into a text file, it then reads from the text file and removes the <>, but i am unsure of how to remove everything between those tags.

JavaScript

Advertisement

Answer

Placing just the code that removes the contents between the ‘<‘ and ‘>’ tags (assuming that you deal with proper html, meaning that you don’t have one tag nested in the declaration of the other like <html < body> >). I am just changing a small portion of your code. I will also remove the tags from the buf variable, instead of replacing the undesired characters with intervals, because I think this will be more useful to you (correct me if I am wrong).

JavaScript
Advertisement