Skip to content
Advertisement

Using awk to trim away parts of a text file outside 2 patterns

I’d like an elegant awk solution to edit the lines in a file. So far I’ve only managed to complete the task using 2 sed commands and 1 awk command.

Each file is composed of a header of indeterminate length, followed by the data I want to capture, then a footer which always starts with the same string (WATER). The data is made up of several 3 line chunks, which I’d like to concatenate into single lines, each 3 line chunk starts with the same string (GROUPS).

Whenever I find GROUPS concatenate following lines until the next occurence of GROUPS and repeat, until finding WATER, delete the WATER line, and all following lines to the end of the file.

input:

header stuff
more header stuff
even more header stuff
GROUPS data data data data
mo data mo data mo data
even more even more
GROUPS data data data data
mo data mo data mo data
even more even more
GROUPS data data data data
mo data mo data mo data
even more even more
.......
last line of data
WATER footer stuff footer stuff
footer stuff
more footer stuff
even more footer stuff

output:

GROUPS data data data data mo data mo data mo data even more even more
GROUPS data data data data mo data mo data mo data even more even more
GROUPS data data data data mo data mo data mo data even more even more
........
GROUPS data data data data mo data mo data even more last line of data

Any help would be greatly appreciated!

EDIT:

Here are my (probably flaky) solutions!

1:Trim header

sed -n '/"GROUPS"/,$p' originalfile > outputfile1

2:Trim footer

sed '/"WATER"/,$d' outputfile1 > outputfile2

3:Concatenate lines

awk 'NF&&$1=RS$1' RS="GROUPS" outputfile2 > finaloutputfile

Advertisement

Answer

Here is an gnu awk (gnu due to multiple characters in Record Separator)

awk -v RS="GROUPS|WATER" -F"n" 'p=="WATER"{exit} {$1=p $1}NR>1; {p=RT}' file
GROUPS data data data data mo data mo data mo data even more even more
GROUPS data data data data mo data mo data mo data even more even more
GROUPS data data data data mo data mo data mo data even more even more ....... last line of data

By setting RS to GROUPS and WATER and recreate line $1=p $1 it makes all in one line.
If line then starts with WATER, exit. This way no more line is printed from WATER and down.
p is set to previous RT (the separator used)

User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement