Skip to content
Advertisement

What is the fastest way to increase the size of a file in linux on a ext4 filesystem from a C executable without creating holes in the file?

The fastest way to increase the file size that I know of would be to ftruncate() or lseek() to the desired size and write a single byte. That doesn’t fit my needs in this case because the resulting hole in the file doesn’t reserve space in the file system.

Is the best alternative to use calloc() and write()?

int increase_file_size_(int fd, int pages) {
    int pagesize = 4096;
    void* data = calloc(pagesize, 1);
    for(int i = 0; i < pages; ++i) {
       // In a real world program this would handle partial writes and interruptions
       if (write(fd, data, pagesize) != pagesize) {
          return -1;
    }
    return 0;
}

Perhaps this can be made even faster by using writev. The next version should be faster since calloc has to zero initialize less memory, and more of the data fits in the CPU cache.

int increase_file_size_(int fd, int pages) {
    int pagesize = 4096/16;
    void* data = calloc(pagesize, 1);
    struct iovec iov[16];
    for(int i = 0; i < 16; ++i) {
      iov[i].iov_base = data;
      iov[i].iov_len = pagesize ;
    }
    for(int i = 0; i < pages; ++i) {
       // In a real world program this would handle partial writes and interruptions
       if (writev(fd, data, pagesize) != pagesize * 16) {
          return -1;
    }
    return 0;
}

I can experiment and see which of these approaches and which page size is the faster. However, is there another approach that is considered the normal best practice for extending a file? Are there other approaches that I should performance test?

Thank you.

Advertisement

Answer

Take a look at the posix_fallocate() function: it reserves space for a file without necessarily writing any data to occupy that space. The allocated space works sort of like a sparse file in that you can read from it even though you haven’t explicitly written anything to it, but unlike a sparse file, it actually reduces the amount of free space in the filesystem. You’re assured that you can write to that region of the file later without running out of space.

Note that posix_fallocate() doesn’t seem to make any guarantees about the content of the allocated space if you read it before writing to it. I think the Linux implementation will return all zero bytes, similar to a sparse file, but you probably shouldn’t rely on that. Treat it as garbage before you write something real to it.

Also note that not all filesystem drivers support the preallocation feature that posix_fallocate() takes advantage of, and I think it’ll fall back on actually writing data to the file (the normal way) if preallocation isn’t supported. Typical Linux filesystems like ext4 and XFS are OK, but if you try it on something like FAT or NTFS, your program will probably end up blocking on I/O for awhile.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement