Skip to content
Advertisement

c++ close a open() file read with mmap

I am working with mmap() to fastly read big files, basing my script on this question answer (Fast textfile reading in c++).

I am using the second version from sehe answer :

#include <algorithm>
#include <iostream>
#include <cstring>

// for mmap:
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>

const char* map_file(const char* fname, size_t& length);

int main()
{
    size_t length;
    auto f = map_file("test.cpp", length);
    auto l = f + length;

    uintmax_t m_numLines = 0;
    while (f && f!=l)
        if ((f = static_cast<const char*>(memchr(f, n, l-f))))
            m_numLines++, f++;

    std::cout << "m_numLines = " << m_numLines << "n";
}

void handle_error(const char* msg) {
    perror(msg);
    exit(255);
}

const char* map_file(const char* fname, size_t& length)
{
    int fd = open(fname, O_RDONLY);
    if (fd == -1)
        handle_error("open");

    // obtain file size
    struct stat sb;
    if (fstat(fd, &sb) == -1)
        handle_error("fstat");

    length = sb.st_size;

    const char* addr = static_cast<const char*>(mmap(NULL, length, PROT_READ, MAP_PRIVATE, fd, 0u));
    if (addr == MAP_FAILED)
        handle_error("mmap");

    // TODO close fd at some point in time, call munmap(...)
    return addr;
}

and it works just great.

But if I implement it over a loop of several files (I just change the main() function name to:

void readFile(std::string &nomeFile) {

and then get the file content in “f” object in main() function with:

size_t length;
auto f = map_file(nomeFile.c_str(), length);
auto l = f + length;

and call it from main() on a loop over a filenames list), after a while I got:

open: Too many open files

I imagine there would be a way to close the open() call after working on a file, but I can not figure out how and where to put it exactly. I tried:

int fc = close(fd);

at the end of the readFile() function but it did change nothing.

Thanks a lot in advance for any help!

EDIT:

after the important suggestions I received I made some performance comparison with different approaches with mmap() and std::cin(), check out: fast file reading in C++, comparison of different strategies with mmap() and std::cin() results interpretation for the results

Advertisement

Answer

Limit to the number of concurrently open files

As you can imagine, keeping a file open consumes resources. So there is in any case a practical limit to the number of open file descriptors on your system. This is why it’s highly recommended to close files that you no longer need.

The exact limit depends on the OS and the configuration. If you want to know more, there are already a lot of answers available for this kind of question.

Special case of mmap

Obviously, with mmap() you open a file. And doing so repetitively in a loop risk to reach sooner or later the fatal file description limit, as you could experience.

The idea of trying to close the file is not bad. The problem is that it does not work. This is specified in the POSIX documentation:

The mmap() function adds an extra reference to the file associated with the file descriptor fildes which is not removed by a subsequent close() on that file descriptor. This reference is removed when there are no more mappings to the file.

Why ? Because mmap() links the file in a special way to the virtual memory management in your system. And this file will be needed as long as you use the address range to which it was allocated.

So how to remove those mappings ? The answer is to use munmap():

The function munmap() removes any mappings for those entire pages containing any part of the address space of the process starting at addr and continuing for len bytes.

And of course, close() the file descriptor that you no longer need. A prudent approach would be to close after munmap(), but in principle, at least on a POSIX compliant system, it should not matter when you’re closing. Nevertheless, check your latest OS documentation to be on the safe side 🙂

*Note: file mapping is also available on windows; the documentation about closing the handles is ambiguous on potential memory leaks if there are remaining mappings. This is why I recommend prudence on the closing moment. *

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement