Is there a way to unzip part of a .gz
file without having to unzip it all?
I have a large (~139Gb) zipped .csv.gz
file. I have been told that the .csv
file has ~540M rows of data. I only need to access a sample of the data in the .csv
file and I would be happy for it to be the first 1M rows (which would constitute about ~250Mb of the zip file). I am happy to unzip by number of rows, or by number of bytes, but would prefer to not have to unzip the entire file to access only a sample of the data.
Advertisement
Answer
Untested (I don’t have [or care to create] a gzipped CSV file that big):
zcat file.csv.gz | head -n 1000000 > extract.csv