Redirected output from a subprocess call getting lost?

Question

I have some Python code that goes roughly like this, using some libraries that you may or may not have: Basically, I'm starting a subprocess that's supposed to go download some data for me and print it to standard out. I'm redirecting that data to a file, and then, as soon as the subprocess call returns, I'm closing my handle

Accepted Answer

Yes, there could be minutes before the data is written to the disk (physically). But you can read it long before that.Unless you are worrying about a power failure or a kernel panic; it doesn&#8217;t matter whether the data is on disk. The important part whether the kernel thinks that the data is written.It is safe to read from the file as soon as check_call() returns. If you don&#8217;t see all the data; it may indicate a bug in bcftools or that writeGlobalFile() doesn&#8217;t upload all the data from the file. You could try to workaround the former by disabling the block-buffering mode for bsftools&#8216; stdout (provide a pseudo-tty, use unbuffer command-line utility, etc).  Q: Is my reading of the specs correct? Can a child process appear to its parent to have terminated before its redirected standard output is available on disk?yes. yes.  Q: Is it possible to somehow wait until all data written by the child process to files has actually been synced to disk by the OS?no. fsync() is not enough in the general case. Likely, you don&#8217;t need it anyway (reading data back is a different issue, from making sure that it is written to disk).   Q: Should I be calling flush() or some Python version of fsync() on the parent process&#8217;s copy of the file object? Can that force writes to the same file descriptor by child processes to be committed to disk?It would be pointless. .flush() flushes buffers that are internal to the parent process (you can use open(filename, 'wb', 0) to avoid creating unnecessary buffers in the parent).fsync() works on a file descriptor (the child has its own file descriptor). I don&#8217;t know whether the kernel uses different buffers for different file descriptors referring to the same disk file. Again, it doesn&#8217;t matter &#8212; if you observe data missing (no-crashes); fsync() won&#8217;t help here.  Q: Just to be clear, I see that you&#8217;re asserting that the data should indeed be readable by other processes, because the relevant OS buffers are shared between processes. But what&#8217;s your source for that assertion? Is there a place in a spec or the Linux documentation you can point to that guarantees that those buffers are shared? Look for &#8220;After a write() to a regular file has successfully returned&#8221;:  Any successful read() from each byte position in the file that was  modified by that write shall return the data specified by the write()  for that position until such byte positions are again modified.

Advertisement

Answer