How to edit 300 GB text file (genomics data)?

Question

I have a 300 GB text file that contains genomics data with over 250k records. There are some records with bad data and our genomics program &#8216;Popoolution&#8217; allows us to comment out the &#8220;bad&#8221; records with an asterisk. Our problem is that we cannot find a text editor that will load the dat…

Accepted Answer

Based on your update:  One more thought&#8230; Is there an approach that would allow us to add  the asterisk to the line without opening the entire text file at once.  This could be very useful given that we will have to repeat the  process an unknown number of times.Here you have an approach: If you know the line number, you can add an asterisk in the beginning of that line saying:sed 'LINE_NUMBER s/^/*/' fileSee an example:$ cat fileaabbccddee$ sed '3 s/^/*/' fileaabb*ccddeeIf you add -i, the file will be updated:$ sed -i '3 s/^/*/' file$ cat fileaabb*ccddeeEven though I always think it&#8217;s better to do a redirection to another filesed '3 s/^/*/' file > new_fileso that you keep intact your original file and save the updated one in new_file.

Advertisement

Answer