Extracting a set of characters for a column in a txt file

Question

I have a bed file (what is a txt file formed by columns separated by tabs). The fourth column has a name followed by numbers. Using the command line (Linux), I would like to get these names without repetition. A provided an example below. This is my file: My list should look like this: Could please help me with the

Accepted Answer

Given so.txt:1       2160195 2161184 SKI_1.2160205.21611741       2234406 2234552 SKI_1.2234416.22345421       2234713 2234849 SKI_1.2234723.22348391       2235268 2235551 SKI_1.2235278.22355411       2235721 2236034 SKI_1.2235731.22360241       2237448 2237699 SKI_1.2237458.22376891       2238005 2238214 SKI_1.2238015.22382041       9770503 9770664 PIK3CD_1.9770513.97706541       9775588 9775837 PIK3CD_1.9775598.97758271       9775896 9776146 PIK3CD_1.9775906.9776136Then the following command should do the trick:cat so.txt | awk '{split($4,f,".");print f[1];}' | sort -u$4 is the 4th columnWe split the 4th column on the . character. The result is put into the f arrayFinally we filter out the duplicates with sort -u

Advertisement

Answer