Skip to content
Advertisement

using tr to strip characters but keep line breaks

I am trying to format some text that was converted from UTF-16 to ASCII, the output looks like this:

C^@H^@M^@M^@2^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
T^@h^@e^@m^@e^@ ^@M^@a^@n^@a^@g^@e^@r^@ ^@f^@o^@r^@ ^@3^@D^@S^@^@^@^@^@^@^@^@^@^@^@^@^@^@

The only text I want out of that is:

CHMM2
Theme Manager for 3DS

So there is a line break “n” at the end of each line and when I use

tr -cs 'a-zA-Z0-9' 'newtext' infile.txt > outfile.txt

It is stripping the new line as well so all the text ends up in one big string on one line.

Can anyone assist with figuring out how to strip out only the ^@’s and keeping spaces and new lines?

Advertisement

Answer

The ^@s are most certainly null characters, s, so:

tr -d ''

Will get rid of them.

But this is not really the correct solution. You should simply use theiconv command to convert from UTF-16 to UTF-8 (see its man page for more information). That is, of course, what you’re really trying to accomplish here, and this will be the correct way to do it.

This is an XY problem. Your problem is not deleting the null characters. Your real problem is how to convert from UTF-16 to either UTF-8, or maybe US-ASCII (and I chose UTF-8, as the conservative answer).

Advertisement