I have the following string, which is the output of a cassandra query in bash
col1|col2|col3+++++++++++A|1|a B|2|b C|3|c D|4|d (3 rows)
I want to split this string so as to remove the string in the beginning till the last +
symbol and then remove the tail end, which is (XYZ rows)
.
So, the string becomes A|1|a B|2|b C|3|c D|4|d
. Now, I want to split this string into multiple arrays that look like this
A 1 a B 2 b C 3 c D 4 d
so that I can iterate over each row using a for loop to do some processing. The number of rows can vary.
How can I do this using sed or grep?
I tried this for the first pass but it didn’t work:
echo $string | sed 's/([0-9])rows//' | sed 's/[^+]//'
NOTE: the column strings can have multiple spaces in them ex: output of CQL query when written to file is
topic | partition | offset ---------+-----------+-------- topic_2 | 31 | 4 topic_2 | 30 | 4 topic_2 | 29 | 4 topic_2 | 28 | 4 topic_2 | 27 | 4 topic_2 | 26 | 4 topic_2 | 25 | 4 topic_2 | 24 | 4 topic_2 | 23 | 4 topic_2 | 22 | 4 topic_2 | 21 | 4 topic_2 | 20 | 4 topic_2 | 19 | 4 topic_2 | 18 | 4 topic_2 | 17 | 4 topic_2 | 16 | 4 topic_2 | 15 | 4 topic_2 | 14 | 4 topic_2 | 13 | 4 topic_2 | 12 | 4 topic_2 | 11 | 4 topic_2 | 10 | 4 topic_2 | 9 | 4 topic_2 | 8 | 4 topic_2 | 7 | 4 topic_2 | 6 | 4 topic_2 | 5 | 4 topic_2 | 4 | 4 topic_2 | 3 | 4 topic_2 | 2 | 4 topic_2 | 1 | 4 topic_2 | 0 | 4 (32 rows)
Advertisement
Answer
$ sed 's/[^+]*[+]*(.*[^ ]) *(.*)$/1/;y/ |/n /' <<< 'col1|col2|col3+++++++++++A|1|a B|2|b C|3|c D|4|d (3 rows)' A 1 a B 2 b C 3 c D 4 d
The substitution does the following (hat tip to potong for pointing out how to get rid of one more substitution):
s/ [^+]* # Match non-plusses [+]* # Followed by plusses ( # Capture the next group .* # Any characters (greedily) [^ ] # that end with a non-space ) # End of capture group * # Spaces (.*) # Followed by whatever in parentheses $/1/ # Replace all that by the capture group
resulting in this intermediate stage:
$ sed 's/[^+]*[+]*(.*[^ ]) *(.*)$/1/' <<< 'col1|col2|col3+++++++++++A|1|a B|2|b C|3|c D|4|d (3 rows)' A|1|a B|2|b C|3|c D|4|d
The transformation (y///
) turns all spaces into newlines and pipes into spaces.
Spaces other than the ones separating rows
If there are spaces within column and we assume that each entry has the format
[spaces]entry[spaces]
i.e., exactly two sets of spaces per entry, we have to replace the transformation y///
with another substitution,
s/([^ |])( +[^ |])/1n2/g
This looks for spaces following not a space or pipe and followed by not a space or pipe, and inserts a newline before those spaces. Result:
$ var='col1 | col2 | col3 +++++++++++ A | 1 | a B | 2 | b C | 3 | c D | 4 | d (3 rows)' $ sed 's/[^+]*[+]*(.*[^ ]) *(.*)$/1/;s/([^ |])( +[^ |])/1n2/g' <<< "$var" A | 1 | a B | 2 | b C | 3 | c D | 4 | d