This below shows this some lines of csv file, i want to get the results that only get the Population column with only using grep command.
id,Association,Population,Variant(s),Gene(s),PubMed 1,non-significant,Dutch,HLA-B40,HLA-B,1859103 2,non-significant,Dutch,HLA-DRB5,HLA-DRB5,1859103 3,non-significant,Finnish,APOB,APOB,8018664 4,significant,Finnish,APOC3,APOC3,8018664 5,significant,Finnish,E2/E3/E4,APOE,8018664 6,significant,French,I/D,ACE,8136829
results i want:
Dutch Dutch Finnish Finnish Finnish French
The command i made for this problem was
cat information.csv | grep -Eo '^([^,]*,){2}[^,]*'
which got results below
id,Association,Population 1,non-significant,Dutch 2,non-significant,Dutch 3,non-significant,Finnish 4,significant,Finnish 5,significant,Finnish 6,significant,French
how can i get rid of the rest of things without using awk sed or any other things?
Advertisement
Answer
You may use a GNU grep with a PCRE pattern:
grep -Po '^([^,]*,){2}K[^,]*' file
Here,
^– start of string([^,]*,){2}– two occurrences of any zero or more chars other than,and then a,K– match reset operator discarding all text matched so far[^,]*– zero or more chars other than a comma.