Remove newline before a match – Linux

Question

I want to remove the newline before the in my HTML file with a Linux command (sed, awk...). Sample input: Sample output: I tried different syntax, but none of them could do. Answer First of all, as mentioned in the comments Don't parse XML with Regex! Never do it, never think about it. Make it a habit not to

Accepted Answer

First of all, as mentioned in the comments Don’t parse XML with Regex! Never do it, never think about it. Make it a habit not to think about it! Sometimes it might look to be a simple task that can be performed with sed or awk or any other regex parser, but no …What you can do, on the other hand—if you really want to use sed or awk — processes the file first with xmlstarlet and convert it into a PYX format.The PYX format is a line-oriented representation ofXML documents that is derived from the SGML ESIS format.(see ESIS – ISO 8879 Element Structure Information Set spec,ISO/IEC JTC1/SC18/WG8 N931 (ESIS))So what you realy want to do is something like :$ xmlstarlet pyx | do_your_magic_here | xmlstarlet depyx > file.new.htmlIn your case this would be something like:$ xmlstarlet pyx file.html | awk 'c~/^- * *$/&&/^)script$/{c=$0;next}{print c; c=$0}END{print c}' | xmlstarlet depyxThis will output JavaScript Ders 2

Advertisement

Answer