linux sed reads whole file when only editing first line

Question

I am currently working with CSV files that can be 10's of GB in size, and need to edit the headers dynamically depending on the use case. For this I am using: which has the desired effect of only editing the headers, but can take upwards of 10 seconds to complete. I imagine this is because the whole file is

Accepted Answer

sed is not the problem.  The problem is that you are streaming a 10GB file.  If this is the only operation you are doing on it, sed is probably not much worse than any other line based utility (awk etc).  Perl may do a better job if you read the whole file first, but your memory footprint is going to be pretty big and depending on your system, you may start paging.If it is something you are going to do frequently and for a long time, you may be able to do better in a lower level language by reading larger blocks of data, allowing the block layer to optimize your disk access for you.  If you keep the &#8220;chunks&#8221; large enough for the block layer, but small enough to avoid paging, you should be able to hit the sweet spot.Probably not worth it for a 1 off conversion.

Advertisement

Answer