Skip to content
Advertisement

How to reliably read data from a file which is being continuously written by another process?

So, I am in the situation where one process is continuously (after each few seconds) writing data to a file (not appending). The data is in the form of json. Now another process has to read this file at regular intervals. Now it could be that the reading process reads it while the writing process is writing to the file.

A soluition to this problem that I can think of is for the writer process to also write a corresponding checksum file. The reader process would now have to read both the file and its checksum file. If the calculated checksum doesn’t match, the reader process would repeat the process until the calculated checksum matches. In this way, now it would know that it has read the correct data.

Or maybe a better solution is to read the file twice after a certain time period (much less than the writing interval of the writing process), and see if the read data matches.

The third way could be to write some magic data at the end of the file, so that the reading process knows that it has read the whole file, if it has encoutered that magic data at the end.

What do you think? Are these solutions viable, or are there better methods to achieve this?

Advertisement

Answer

Create an entire new file each time, and rename() the new file once it’s been completely written:

If newpath already exists, it will be atomically replaced, so that there is no point at which another process attempting to access newpath will find it missing. …

Some copy of the file will always be there, and it will always be complete and correct:

So, instead of

writeDataFile( "/path/to/data/file.json" );

and then trying to figure out what to do in the reader process(es), you simply do

writeDataFile( "/path/to/data/file.json.new" );
rename( "/path/to/data/file.json.new", "/path/to/data/file.json" );

No locking is necessary, nor any reading of the file and computing checksums and hoping it’s correct.

The only issue is any reader process has to open() the file each time it needs to read the latest copy – it can’t keep and open file descriptor on the file and try to read new contents as the rename() call unlinks the original file and replaces it with an entirely new file.

User contributions licensed under: CC BY-SA
7 People found this is helpful
Advertisement