I have the following command to open a tbz file:
# pricing20150304.tbz tar xpj -C {tarball} {files_to_unarchive}
The compressed file is about 15 GB and when it is expanded it is about 500GB or so. This operation on an ec2-4x-large operation takes roughly 1h40m.
Is there a way to optimize this operation? What would be the fastest way to do the above operation?
Advertisement
Answer
A couple possibilities come to mind. First off, bzip2 is pretty slow, so if you can use a different algorithm you might want to consider doing so. Assuming you still want a fairly high ratio, LZHAM and Brotli might be good choices; they take longer to compress but are much faster when it comes to decompression, and IIRC both come with multi-threaded decompressors. There are lots of choices, and they all have different trade-offs between compression speed, decompression speed, and ratio.
If a different algorithm isn’t an option, you might want to consider using pbzip2 instead of bzip2. Something like pbzip2 -dc infile.tar.bz2 | tar x
.