I have a file that can be bigger than 4GB. I am using the linux split
command to split it by lines (that’s the requirement).
But after splitting the original file, I want the size of split file to be always less than 2GB.
The original file size can vary from 3-5 GB.
I want to write some logic for this in my shell script and feed the number of lines into my split
command below to keep the split file sizes less than 2 GB.
split -l 100000 -d abc.txt abc
Advertisement
Answer
That’s how I solved this problem. Sorry for posting the solution late.
1. Declared a global variable DEFAULT_SPLITFILE_SIZE= 1.5Gb
DEFAULT_SPLITFILE_SIZE=1500000000
2. Calculated no of lines in the file.
LINES_IN_FILE=`wc -l $file | awk '{print $1}'` echo `date` "Total word count = ${LINES_IN_FILE}."
3. Calculated the size of a file.
FILE_SIZE=`stat -c %s "${file}"`
4. Calculated size of each line in the file.
SIZE_PER_LINE=$(( FILE_SIZE / LINES_IN_FILE )) echo `date` "Bytes Per Line = $SIZE_PER_LINE"
5. Calculated the no of lines needed to make it a 1.5gb split file.
SPLIT_LINE=$(( DEFAULT_SPLITFILE_SIZE / SIZE_PER_LINE )) echo `date` "Lines for Split = $SPLIT_LINE"