Skip to content
Advertisement

using only ‘grep’ command to get specific column

This below shows this some lines of csv file, i want to get the results that only get the Population column with only using grep command.

id,Association,Population,Variant(s),Gene(s),PubMed
1,non-significant,Dutch,HLA-B40,HLA-B,1859103
2,non-significant,Dutch,HLA-DRB5,HLA-DRB5,1859103
3,non-significant,Finnish,APOB,APOB,8018664
4,significant,Finnish,APOC3,APOC3,8018664
5,significant,Finnish,E2/E3/E4,APOE,8018664
6,significant,French,I/D,ACE,8136829

results i want:

Dutch
Dutch
Finnish
Finnish
Finnish
French

The command i made for this problem was

 cat information.csv | grep -Eo '^([^,]*,){2}[^,]*'

which got results below

id,Association,Population
1,non-significant,Dutch
2,non-significant,Dutch
3,non-significant,Finnish
4,significant,Finnish
5,significant,Finnish
6,significant,French

how can i get rid of the rest of things without using awk sed or any other things?

Advertisement

Answer

You may use a GNU grep with a PCRE pattern:

grep -Po '^([^,]*,){2}K[^,]*' file

Here,

  • ^ – start of string
  • ([^,]*,){2} – two occurrences of any zero or more chars other than , and then a ,
  • K – match reset operator discarding all text matched so far
  • [^,]* – zero or more chars other than a comma.
Advertisement