Imagine a huge file that should be edited by my program. In order to increase read time I use mmap() and then only read out the parts I’m viewing. However if I want to add a line in the middle of the file, what’s the best approach for that?
Is the only way to add a line and then move the rest of the file? That sounds expensive.
So my question is basically: What’s the most efficient way of adding data in the middle of a huge file?
Advertisement
Answer
The only way to insert data in the middle of any (huge or small) file (on Linux or POSIX) is to copy that file (into a fresh one, then later rename(2) the copy as the original). So you’ll copy its head (up to insertion point), you’ll append the data to that copy, and then you copy the tail (after insertion point). You might consider also calling posix_fadvise(2) (or even the Linux specific readahead(2)…) but that does not aleviate the need to copy all the data. mmap(2) might be used e.g. to replace read(2) but whatever you do requires you to copy all the data.
Of course, if it happens that you are replacing a data chunk in the middle of the file by another chunk of the same size (so no real insertion), you can use plain lseek(2) + write(2)
Is the only way to add a line and then move the rest of the file? That sounds expensive.
Yes it is conceptually the only way.
You should consider using something else that a plain textual file: look into SQLite or GDBM (they might be very efficient in your use case). See also this answer. Both provides you with some higher abstraction than POSIX files, so give you the ability to “insert” data (Of course they are still internally based upon and using POSIX files).