Skip to content
Advertisement

Extracting a set of characters for a column in a txt file

I have a bed file (what is a txt file formed by columns separated by tabs). The fourth column has a name followed by numbers. Using the command line (Linux), I would like to get these names without repetition. A provided an example below.

This is my file:

JavaScript

My list should look like this:

JavaScript

Could please help me with the code I need to use?

I found the solution years ago with grep but I have lost the document in which I used to save all useful codes.

Advertisement

Answer

Given so.txt:

JavaScript

Then the following command should do the trick:

JavaScript
  1. $4 is the 4th column
  2. We split the 4th column on the . character. The result is put into the f array
  3. Finally we filter out the duplicates with sort -u
Advertisement