I’m trying to prepare a few shuffled files with Linux shuf
command, however, through shuf --help
, it doesn’t provide any “seed” option. I wonder if there is any workaround? Thanks!
Updates:
thanks to the remind in the comments, I realize (1) shuf FILE
produce different result each time; and (2) when --random-source
are same, the produced results are same, so I guess can be used to make the shuffling procedure replicatable?
Then just out of curiosity: for long I thought randomness is controlled by the random seed, but as the man
said, --random-source
specify where to read the “random bytes”. I wonder how is this random bytes related to the random seed?
Advertisement
Answer
I wonder if there is any workaround?
There is --random-source
option.
I guess can be used to make the shuffling procedure replicatable?
This is what --random-source
is for.
how is this random bytes related to the random seed?
It is the source of random bytes.
GNU shuf
has an optimized as much as possible algorithm and that algorithm strongly depends on other options. In the most simplistic and crude oversimplification, you just choose a random number from 1
to the count of lines in the file and print the line with that number. Then repeat and choose a another number without repeats.
Browsing GNU shuf.c sources I see the number generation is in shuf/randint.c randint_genmax().
Basically I believe you are confusing algorithms. Indeed pseudorandom number generators use a seed. However shuf
is intended to work with /dev/urandom
– a special file where the kernel itself keeps a global seed. And shuf
is intended to work with any random number generation method. So shuf
instead of requiring a seed for a specific random number generation method, it requires a stream which provides random bytes when reading from it.