High performance reading – linux/pthreads

Question

I have moderately large binary file consisting of independent blocks like this: header1 data1 header2 data2 header3 data3 ... The number of blocks, the size of each block and the total size of the file vary quite a lot, but typical numbers are ~1000 blocks and average blocksize 100kb. The files are generated by an external application which I have

Accepted Answer

Most of the time is probably spent in accessing the disk. So perhaps buying an SSD is sensible. (Whatever you do, your application is I/O bound).Apparently, your file is only about 100Mb. You could get it on disk (kernel file) cache just by reading it, e.g. with cat yourfile > /dev/null before running your program. For such a small file (on a reasonable machine it fits in RAM), I won&#8217;t worry that much.You could pre-process the text file, e.g. to make a database (for sqlite, or a real RDBMS like PostGreSQL) or just a gdbm indexed file.If using <stdio.h> you might have a bigger buffer with setbuffer, or call fopen with a "rmt" mode (the m is a GNU Glibc extension to ask mmap-ing it).You could use mmap with madvise.You could (perhaps in a separate thread) use the readahead syscall.But your file seems small enough that you should not bother that much. Are you sure it is really a performance issue? Do you read that file many thousand times per day, or do you have many hundreds of such files?

Advertisement

Answer