Skip to content
Advertisement

Grep lines from a file in batches according to a format

I have a file with contents as:

Hi
welcome
! Chunk Start
Line 1
Line2
! Chunk Start
Line 1
Line 2
Line 3
! Chunk Start
Line 1
Line 2
Line 3
Line 1
Line 2
Line 3
Line 4
Line 5
Line 1
Line 2
Line 3
Line 4

Now, everything beginning with “! Chunk Start” and before the next “! Chunk Start” is a chunk, i.e. the lines between “! Chunk Start” , make a chunk. I need to get the contents of each chunk in a single line. i.e.:

Line 1 Line 2
Line 1 Line2 Line 3
Line 1 Line 2 Line 3 Line 1 Line 2 Line 3 Line 4 Line 5 Line 1 Line 2 Line 3 Line 4

I have done this, but I think there should be a better way. The way I have done this is:

grep -A100 "! Chunk Start" file.txt

Rest of the logic is there to concat the lines. But this A100 is what I am worried about. What if there are more than 100 lines in a chunk, this will fail. I probably need to do this with awk/sed. Please suggest.

Advertisement

Answer

This might work for you (GNU sed):

sed '0,/^! Chunk Start/d;:a;$!N;/! Chunk Start/!s/n/ /;ta;P;d' file

Delete upto and including the first line containing ! Chunk Start. Gather up lines replacing the newline by a space. When the next match is found print the first line, delete the pattern space and repeat.

User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement