how to replace [^u0009u000Au000Du0020-uD7FFuE000-uFFFDu10000-u10FFF]+ to “” in a file by sed or anything?

Question

I find and replace some strange characters in xml file with text editor. with regular expression Now, I need to it in linux command line. I ask you how to use sed or anything that same find and replacement job in linux command line. Thank you in advance Answer You can try this : Before replacing, be sure you really

Accepted Answer

You can try this :sed 's/\u(0009|000A|000D|0020|D7FF|E000|FFFD|10000|10FFF)//g' <<< "[^u0009u000Au000Du0020-uD7FFuE000-uFFFDu10000-u10FFF]"Before replacing, be sure you really want to replace this characters as some of them are tabs, newlines, spaces&#8230;Update :One more generic pattern based on your 4-5 hexa codes sample :sed 's/\u[0-9A-F]{4}[0-9A-F]?//g' <<< "[^u0009u000Au000Du0020-uD7FFuE000-uFFFDu10000-u10FFF]"  will replace all u followed by 4 or 5 hexa codesPlease note that a capitalized word (ie Foo) following a 4 hexa code string will match :u0000Foo will be changed to oo as the F of Foo will match the 5th optional hexa code.

Advertisement

Answer