This below shows this some lines of csv file, i want to get the results that only get the Population column with only using grep
command.
id,Association,Population,Variant(s),Gene(s),PubMed 1,non-significant,Dutch,HLA-B40,HLA-B,1859103 2,non-significant,Dutch,HLA-DRB5,HLA-DRB5,1859103 3,non-significant,Finnish,APOB,APOB,8018664 4,significant,Finnish,APOC3,APOC3,8018664 5,significant,Finnish,E2/E3/E4,APOE,8018664 6,significant,French,I/D,ACE,8136829
results i want:
Dutch Dutch Finnish Finnish Finnish French
The command i made for this problem was
cat information.csv | grep -Eo '^([^,]*,){2}[^,]*'
which got results below
id,Association,Population 1,non-significant,Dutch 2,non-significant,Dutch 3,non-significant,Finnish 4,significant,Finnish 5,significant,Finnish 6,significant,French
how can i get rid of the rest of things without using awk
sed
or any other things?
Advertisement
Answer
You may use a GNU grep with a PCRE pattern:
grep -Po '^([^,]*,){2}K[^,]*' file
Here,
^
– start of string([^,]*,){2}
– two occurrences of any zero or more chars other than,
and then a,
K
– match reset operator discarding all text matched so far[^,]*
– zero or more chars other than a comma.