Skip to content
Advertisement

Processing stdout and stderr separately adding a timestamp: Wrong order

These are my test files:

std-test.sh:

#!/bin/bash
for i in {0..20}
do
    number=$RANDOM
    let "number %= 10"
    if [ $number -le 6 ]
    then
        echo $i;
    else
        echo $i 1>&2;
    fi
done

process.sh:

#!/bin/bash

while read line; do
    [[ $1 = "err" ]] && echo "$(date +%s%3N) $line" >> stderr;
    [[ $1 = "out" ]] && echo "$(date +%s%3N) $line" >> stdout;
done

std-test.sh creates twenty lines containing their line number, and process.sh is able to read another commands output when piped to it, while it saves to stderr when err is passed to it as argument, and to stdout when out is passed. The command I use to test the whole thing is: ./std-test.sh 2> >(./process.sh err) > >(./process.sh out) (taken from here). My test outputs are as follows:

stdout:

1486297129986 0
1486297129987 1
1486297129988 2
1486297129988 3
1486297129989 4
1486297129990 6
1486297129991 7
1486297129991 8
1486297129992 9
1486297129993 10
1486297129993 11
1486297129994 12
1486297129995 14
1486297129995 17
1486297129996 18

stderr:

1486297129986 5
1486297129987 13
1486297129987 15
1486297129988 16
1486297129989 19
1486297129990 20

When I tried parsing the output, I realized, that the order of stderr and stdout is completely messed up. Why does 5 in stderr come before 1 in stdout, for instance? Or 13 right after 1? What’s the mistake I made here, and how could I resolve it?

Edit #1: I know I could pipe stderr and stdout directly to files, but process.sh will curl the results to a remote server. Saving it to files is just kind of a POC.

Edit #2: I am actually trying to timestamp and process docker logs programmatically.

Advertisement

Answer

Update:

From the comments it appears OP was really trying to timestamp and log the output from an application running in a docker container.

docker logs -t already does this and is the best tool for job.


Original answer:

Probably because you’re timestamping each line with the time it is processed and not the time they’re actually generated?

You launch 3 processes when you run that command, not all of them run in parallel.

The processes aren’t running in a truly parallel fashion, they can get preempted whenever the kernel feels like. Because the stdout is longer/processing more lines, it is more likely to get preempted and then resumed at a later point. Which would explain the timestamping.

What do you really need the timestamps on the lines for? Do you simply want ordering between the lines, or do you want real true timestamps?

Edit: as @Sven Festersen pointed out in the comments, it also might be because stdout is buffered when piped by default.

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement