I’d like an elegant awk solution to edit the lines in a file. So far I’ve only managed to complete the task using 2 sed commands and 1 awk command.
Each file is composed of a header of indeterminate length, followed by the data I want to capture, then a footer which always starts with the same string (WATER). The data is made up of several 3 line chunks, which I’d like to concatenate into single lines, each 3 line chunk starts with the same string (GROUPS).
Whenever I find GROUPS concatenate following lines until the next occurence of GROUPS and repeat, until finding WATER, delete the WATER line, and all following lines to the end of the file.
input:
header stuff more header stuff even more header stuff GROUPS data data data data mo data mo data mo data even more even more GROUPS data data data data mo data mo data mo data even more even more GROUPS data data data data mo data mo data mo data even more even more ....... last line of data WATER footer stuff footer stuff footer stuff more footer stuff even more footer stuff
output:
GROUPS data data data data mo data mo data mo data even more even more GROUPS data data data data mo data mo data mo data even more even more GROUPS data data data data mo data mo data mo data even more even more ........ GROUPS data data data data mo data mo data even more last line of data
Any help would be greatly appreciated!
EDIT:
Here are my (probably flaky) solutions!
1:Trim header
sed -n '/"GROUPS"/,$p' originalfile > outputfile1
2:Trim footer
sed '/"WATER"/,$d' outputfile1 > outputfile2
3:Concatenate lines
awk 'NF&&$1=RS$1' RS="GROUPS" outputfile2 > finaloutputfile
Advertisement
Answer
Here is an gnu awk (gnu due to multiple characters in Record Separator)
awk -v RS="GROUPS|WATER" -F"n" 'p=="WATER"{exit} {$1=p $1}NR>1; {p=RT}' file
GROUPS data data data data mo data mo data mo data even more even more
GROUPS data data data data mo data mo data mo data even more even more
GROUPS data data data data mo data mo data mo data even more even more ....... last line of data
By setting RS to GROUPS and WATER and recreate line $1=p $1 it makes all in one line.
If line then starts with WATER, exit. This way no more line is printed from WATER and down.
p is set to previous RT (the separator used)