Searching a text file backwards from the end

Question

I&#8217;m trying to find the string containing the substring in a text file by starting at the end. The file has tens of millions of lines. (The requirement is to read from End of the File. I cannot use sed/awk/grep etc) The below program does the job but it takes a long time. How can I make it run faster?

Accepted Answer

As noted by commentators, doing IO one character at a time (+ fseek + ftell) is very expensive. Remember that each time the code fgets the first character, the C library will ‘read ahead’ – reading BUFSIZ characters.As an alternative, consider ‘read before’ logic, based on ideas provided above.find out length of pattern plAllocate a buffer size of bsRead blocks of bs backward. Each block will overlap with previous block for pl characters, eliminating the possibility that a match will occur in between blocks.Assuming bs much larger the pl, cost of overlap reads is minimal, and performance close to optimal.Error checks reduced to keep code clear.#include #include #include #include #define BS 100000int main(int argc, char **argv){ FILE *fp = fopen(argv[1], "r"); if ( !fp ) { perror("fopen") ; exit (1) ; }; fseek(fp, 0, SEEK_END) ; long pos = ftell(fp) ; fprintf(stderr, "S=%ldn", pos) ; const char *pattern = argv[2] ; const int PL = strlen(pattern) ; // Read backward char buff[BS+1] ; long match = -1; while ( pos > 0 ) { pos = pos - (BS-PL) ; if ( pos < 0 ) pos = 0 ; fseek(fp, pos, SEEK_SET) ; int n = fread(buff, sizeof(buff[0]), BS, fp) ; if ( n > 0 ) { buff[n] = 0 ; char *loc = strstr(buff, pattern) ; if ( loc ) ) { match = pos + (loc-buff) ; break ;} ; } ; } ; fclose(fp) ; printf("MATCH=%ldn", match) ;}Note that solution will only handle TEXT files. The ‘strstr’ will NOT work on data loaded from binary file which may contain NUL characters.Updated 2019-11-12 to correct file position calculation when match is in first block.

Advertisement

Answer