Why linux split program have weird behavior with large files 20GB?

Question

I'm doing the next statement on my ubuntu: If i do a "ls" If i do a "du" over ab and ad files. As you can see, split divided the file in a non-homogeneous form. Anyone know what's going on? Some unprintable character can hang the split? Thank you. Best Regards! Francisco. Answer While this is unusual data with an

Accepted Answer

While this is unusual data with an average line length of 114137, I&#8217;m not sure that fully describes the issue. Hmm you&#8217;ve 21982648969 of data => each bucket that split is trying to fill is 4396529793. That&#8217;s larger than 2^32. I wonder do we have a 32 bit overflow. Are you on a 32 bit or 64 bit platform? Looking at the code I don&#8217;t see an overflow issue TBH. Note you could anonymize and compress the data providing the following file for download somewhere:tr -c 'n' . < /pathToSource.csv | xz > /pathToSource.csv.xzIt&#8217;s also worth specifying the version since implementation changed a bit between v8.8 and v8.13

Why linux split program have weird behavior with large files >20GB?

Advertisement

Answer