How to clean a data file from binary junk?

Question

I have this data file, which is supposed to be a normal ASCII file. However, it has some junk in the end of the first line. It only shows when I look at it with vi or less --> grep is also saying that it's a binary file: Binary file data.dat matches This is causing some trouble in my parsing

Accepted Answer

Grep is trying to be smart and, when it sees an unprintable character, switches to &#8220;binary&#8221; mode.  Add &#8220;-a&#8221; or &#8220;&#8211;text&#8221; to force grep to stay in &#8220;text&#8221; mode.As for sed, try sed -e 's/([^ -~]*)//g', which says, &#8220;change everything not between space and tilde (chars 0x20 and 0x7E, respectively) into nothing&#8221;.  That&#8217;ll strip tabs, too, but you can insert a tab character before the space to include them (or any other special character).The &#8220;^@&#8221; is one way to represent an NUL (aka &#8220;ascii(0)&#8221; or &#8220;&#8221;).  Some programs may also see that as an end-of-file if they were implemented in a naive way.

Advertisement

Answer