I find and replace some strange characters in xml file with text editor. with regular expression
[^u0009u000Au000Du0020-uD7FFuE000-uFFFDu10000-u10FFF]+ ---> ""
Now, I need to it in linux command line.
I ask you how to use sed or anything that same find and replacement job in linux command line.
Thank you in advance
Advertisement
Answer
You can try this :
sed 's/\u(0009|000A|000D|0020|D7FF|E000|FFFD|10000|10FFF)//g' <<< "[^u0009u000Au000Du0020-uD7FFuE000-uFFFDu10000-u10FFF]"
Before replacing, be sure you really want to replace this characters as some of them are tabs, newlines, spaces…
Update :
One more generic pattern based on your 4-5 hexa codes sample :
sed 's/\u[0-9A-F]{4}[0-9A-F]?//g' <<< "[^u0009u000Au000Du0020-uD7FFuE000-uFFFDu10000-u10FFF]"
will replace all u
followed by 4 or 5 hexa codes
Please note that a capitalized word (ie Foo) following a 4 hexa code string will match :
u0000Foo
will be changed to oo
as the F
of Foo will match the 5th optional hexa code.