Skip to content
Advertisement

getline() is repeatedly reading the file, when fork() is used

I am developing a simple shell program, a command line interpreter and I wanted to read input from the file line by line, so I used getline() function. At the first time, the program works correctly, however, when it reaches the end of the file, instead of terminating, it starts to read a file from the start and it runs infinitely. Here are some codes in main function that are related to getline():

int main(int argc,char *argv[]){
    int const IN_SIZE = 255;
    char *input = NULL;
    size_t len = IN_SIZE;
    // get file address
    fileAdr = argv[2];

    // open file
    srcFile = fopen(fileAdr, "r");

    if (srcFile == NULL) {
        printf("No such file!n");
        exit(-1);
    }

    while (getline( &input, &len, srcFile) != -1) {
        strtok(input, "n");
        printf("%sn", input);
        // some code that parses input, firstArgs == input
        execSimpleCmd(firstArgs);            
    }
    fclose(srcFile);
}

I am using fork() in my program and most probably it causes this problem.

void execSimpleCmd(char **cmdAndArgs) {

    pid_t pid = fork();
    if (pid < 0) {
        // error
        fprintf(stderr, "Fork Failed");
        exit(-1);
    } else if (pid == 0) {
        // child process
        if (execvp(cmdAndArgs[0], cmdAndArgs) < 0) {
            printf("There is no such command!n");
        }
        exit(0);
    } else {
        // parent process
        wait(NULL);
        return;
    }
}

In addition, sometimes the program reads and prints a combinations of multiple lines. For example, if an input file as below:

ping
ww    
ls
ls -l
pwd

it prints something like pwdg, pwdww, etc. How to fix it?

Advertisement

Answer

It appears that closing a FILE in some cases seeks the underlying file descriptor back to the position where the application actually read to, effectively undoing the effect of the read buffering. This matters, since the OS level file descriptors of the parent and the child point to the same file description, and the same file offset in particular.

The POSIX description of fclose() has this phrase:

[CX] [Option Start] If the file is not already at EOF, and the file is one capable of seeking, the file offset of the underlying open file description shall be set to the file position of the stream if the stream is the active handle to the underlying file description.

(Where CX means an extension to the ISO C standard, and exit() of course runs fclose() on all streams.)

I can reproduce the odd behavior with this program (on Debian 9.8):

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

#include <sys/types.h>
#include <sys/wait.h>

int main(int argc, char *argv[]){
    FILE *f;
    if ((f = fopen("testfile", "r")) == NULL) {
        perror("fopen");
        exit(1);
    }

    int right = 0;
    if (argc > 1)
        right = 1;

    char *line = NULL;
    size_t len = 0;
    // first line 
    getline(&line, &len, f);
    printf("%s", line);

    pid_t p = fork();
    if (p == -1) {
        perror("fork");
    } else if (p == 0) {
        if (right)
            _exit(0);  // exit the child 
        else
            exit(0);   // wrong way to exit
    } else {
        wait(NULL);  // parent
    }

    // rest of the lines
    while (getline(&line, &len, f) > 0) {
        printf("%s", line);
    }

    fclose(f);
}

Then:

$ printf 'anbncn' > testfile
$ gcc -Wall -o getline getline.c
$ ./get
getline   getline2  
$ ./getline
a
b
c
b
c

Running it with strace -f ./getline clearly shows the child seeking the file descriptor back:

clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f63794e0710) = 25117
strace: Process 25117 attached
[pid 25116] wait4(-1,  <unfinished ...>
[pid 25117] lseek(3, -4, SEEK_CUR)      = 2
[pid 25117] exit_group(1)               = ?

(I didn’t see the seek back with a code that didn’t involve forking, but I don’t know why.)

So, what happens is that the C library on the main program reads a block of data from the file, and the application prints the first line. After the fork, the child exits, and seeks the fd back to where the application level file pointer is. Then the parent continues, processes the rest of the read buffer, and when it’s finished, it continues reading from the file. Because the file descriptor was seeked back, the lines starting from the second are again available.

In your case, the repeated fork() on every iteration seems to result in an infinite loop.

Using _exit() instead of exit() in the child fixes the problem in this case, since _exit() only exits the process, it doesn’t do any housekeeping with the stdio buffers.

With _exit(), any output buffers are also not flushed, so you’ll need to call fflush() manually on stdout and any other files you’re writing to.

However, if you did this the other way around, with the child reading and buffering more than it processes, then it would be useful for the child to seek back the fd so that the parent could continue from where the child actually left.

Another solution would be not to mix stdio with fork().

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement