Skip to content
Advertisement

C: strtok and newlines in Windows vs Linux

I’m working on a C school assignment that is intended to be done on Windows, however, I’m programming it on OS X. While the other students working on Windows don’t have problems reading a file, I do.

The code provided by the tutors splits the contents of a file on n using this code:

/* Read ADFGX information */
adfgx = read_from_file("adfgx.txt");

/* Define the alphabet */
alphabet = strtok(adfgx, "n");

/* Define the code symbols */
symbols = strtok(NULL, "n");

However, the file adfgx.txt (which is provided for the assignment) has Windows style newlines (rn): I checked it with a hex editor. So, compiling this with the Microsoft C compiler from Visual Studio and running it on Windows splits the file correctly on newlines (rn). Which I think is weird, because I can not find any documentation on this behavior. The other part: when I compile it on OS X using gcc, and I run it: the r is still included in the tokenized string, because it obviously splits on n. If I change the delimiters to the strtok call to "rn", it works for me.

Is this normal that this behaves differently on Windows and Unix? How should I handle this in real life situations (assuming I’m trying to write portable code for Windows and Unix in C that should handle file input that uses rn)?

Advertisement

Answer

If you open the file with fopen("adfgx.txt", "r") on Windows, the file gets opened in “text mode” and the r char gets implicitly stripped from subsequent fread calls. If you had opened the file on Windows with fopen("adfgx.txt", "rb"), the file gets opened in “binary mode”, and the r char remains. To learn about the “rb” mode, and other mode strings, you can read about the different mode parameters that fopen on Windows takes here. And as you might imagine, fwrite on Windows will automatically insert a r into the stream in front of the n char (as long as the file was not opened in binary mode).

Unix and MacOS treat r as any ordinary character. Hence, strok(NULL, "n") won’t strip off the ‘r’ char, because you are not splitting on that.

The easy cross-platform fix would be to invoke strtok as follows on all platforms:

/* Define the alphabet */
alphabet = strtok(adfgx, "rn");

And I think passing "rn" as the delimiter string will clear up most of your issues of reading text files on Windows and vice-versa. I don’t think strtok will return an empty string in either case, but you might need to check for an empty string on each strtok call (and invoke it again to read the next line).

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement