The linux file
command does a very good job in recognising file types and gives very fine-grained results. The diff
tool is able to tell binary files from text files, producing a different output.
Is there a way to tell binary files form text files? All I want is a yes/no answer whether a given file is binary. Because it’s difficult to define binary, let’s say I want to know if diff
will attempt a text-based comparison.
To clarify the question: I do not care if it’s ASCII text or XML as long as it’s text. Also, I do not want to differentiate between MP3 and JPEG files, as they’re all binary.
Advertisement
Answer
The diff manual specifies that
diff determines whether a file is text or binary by checking the first few bytes in the file; the exact number of bytes is system dependent, but it is typically several thousand. If every byte in that part of the file is non-null, diff considers the file to be text; otherwise it considers the file to be binary.