I’d like an elegant awk
solution to edit the lines in a file. So far I’ve only managed to complete the task using 2 sed
commands and 1 awk
command.
Each file is composed of a header of indeterminate length, followed by the data I want to capture, then a footer which always starts with the same string (WATER). The data is made up of several 3 line chunks, which I’d like to concatenate into single lines, each 3 line chunk starts with the same string (GROUPS).
Whenever I find GROUPS concatenate following lines until the next occurence of GROUPS and repeat, until finding WATER, delete the WATER line, and all following lines to the end of the file.
input:
header stuff more header stuff even more header stuff GROUPS data data data data mo data mo data mo data even more even more GROUPS data data data data mo data mo data mo data even more even more GROUPS data data data data mo data mo data mo data even more even more ....... last line of data WATER footer stuff footer stuff footer stuff more footer stuff even more footer stuff
output:
GROUPS data data data data mo data mo data mo data even more even more GROUPS data data data data mo data mo data mo data even more even more GROUPS data data data data mo data mo data mo data even more even more ........ GROUPS data data data data mo data mo data even more last line of data
Any help would be greatly appreciated!
EDIT:
Here are my (probably flaky) solutions!
1:Trim header
sed -n '/"GROUPS"/,$p' originalfile > outputfile1
2:Trim footer
sed '/"WATER"/,$d' outputfile1 > outputfile2
3:Concatenate lines
awk 'NF&&$1=RS$1' RS="GROUPS" outputfile2 > finaloutputfile
Advertisement
Answer
Here is an gnu awk
(gnu due to multiple characters in Record Separator)
awk -v RS="GROUPS|WATER" -F"n" 'p=="WATER"{exit} {$1=p $1}NR>1; {p=RT}' file GROUPS data data data data mo data mo data mo data even more even more GROUPS data data data data mo data mo data mo data even more even more GROUPS data data data data mo data mo data mo data even more even more ....... last line of data
By setting RS
to GROUPS
and WATER
and recreate line $1=p $1
it makes all in one line.
If line then starts with WATER
, exit. This way no more line is printed from WATER
and down.
p
is set to previous RT
(the separator used)