I have a .txt file like below:
9342432_A1 9342432 1 0 0 0 4392483_A2 4392483 2 0 0 0 4324321_A3 4324321 1 0 0 0 9342432 9342432 2 0 0 0
For example, I want to generate a subset with the IDs 4324321_A3 and 9342432 (based on the first column!). I tried the following command to find the exact matches:
grep -E '4324321_A3|9342432'
But when I use this line, I end up with a dataset like this:
9342432_A1 9342432 1 0 0 0 4324321_A3 4324321 1 0 0 0 9342432 9342432 2 0 0 0
The problem is that the line that matches a part of the ID (9342432_A1) shouldn’t be there. Can anyone help me with this?
I would like to end up with this:
4324321_A3 4324321 1 0 0 0 9342432 9342432 2 0 0 0
Advertisement
Answer
It matches
9342432_A1 9342432 1 0 0 0
because it has 9342432
in the second column.
You need to update the command to make grep check lines starting with those words, that is, use ^word
:
$ grep -E '^4324321_A3|^9342432' file 4324321_A3 4324321 1 0 0 0 9342432 9342432 2 0 0 0
To make it more accurate, you can also use -w
that matches the full word. This way grep -wE '^4324321_A3|^9342432' file
would not match a line like
4324321_A3something 4324321 1 0 0 0