Skip to content
Advertisement

How to split a multi-gigabyte file into chunks of about 1.5 gigabytes using Linux split?

I have a file that can be bigger than 4GB. I am using the linux split command to split it by lines (that’s the requirement). But after splitting the original file, I want the size of split file to be always less than 2GB. The original file size can vary from 3-5 GB. I want to write some logic for this in my shell script and feed the number of lines into my split command below to keep the split file sizes less than 2 GB.

split -l 100000 -d abc.txt abc

Advertisement

Answer

That’s how I solved this problem. Sorry for posting the solution late.

1. Declared a global variable DEFAULT_SPLITFILE_SIZE= 1.5Gb

DEFAULT_SPLITFILE_SIZE=1500000000

2. Calculated no of lines in the file.

LINES_IN_FILE=`wc -l $file | awk '{print $1}'`

echo `date`  "Total word count = ${LINES_IN_FILE}."

3. Calculated the size of a file.

FILE_SIZE=`stat -c %s "${file}"`

4. Calculated size of each line in the file.

SIZE_PER_LINE=$(( FILE_SIZE / LINES_IN_FILE ))

echo `date`  "Bytes Per Line = $SIZE_PER_LINE"

5. Calculated the no of lines needed to make it a 1.5gb split file.

SPLIT_LINE=$(( DEFAULT_SPLITFILE_SIZE / SIZE_PER_LINE ))

echo `date`  "Lines for Split = $SPLIT_LINE"
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement