I have a file: each line consist of a name, room number, house address, phone number.
I want to search for the lines that have the area codes of either 404 or 202. I did “(404)|(202)” but it also gives me lines that had the numbers in the phone number in general instead of from area code, example:
John Smith 300 123 N. Street 808-543-2029
I do not want the above, I am targeting lines like this, examples:
Danny Brown 173 555 W. Avenue 202-383-1540 Martha Keith 567 322 S. Example 404-653-1200
Advertisement
Answer
Let’s consider this test file:
$ cat addresses John Smith 202 404 N. Street 808-543-2029 Danny Brown 173 555 W. Avenue 202-383-1540 Martha Keith 567 322 S. Example 404-653-1200
The distinguishing feature of area codes, as opposed to other three digit numbers, is that they have a space before them and a -
after them. Thus, use:
$ grep -E ' (202|404)-' addresses Danny Brown 173 555 W. Avenue 202-383-1540 Martha Keith 567 322 S. Example 404-653-1200
More complex example
Suppose that phone numbers appear at the end of lines but can have any of the three forms 808-543-2029
, 8085432029
, or 808 543 2029
as in the following example:
$ cat addresses John Smith 202 404 N. Street 808-543-2029 Danny Brown 173 555 W. Avenue 2023831540 Martha Keith 567 322 S. Example 404 653 1200
To select the lines with 202 or 404 area codes:
$ grep -E ' (202|404)([- ][[:digit:]]{3}[- ][[:digit:]]{4}|[[:digit:]]{7})$' addresses Danny Brown 173 555 W. Avenue 2023831540 Martha Keith 567 322 S. Example 404 653 1200
If it is possible that the phone numbers are followed by stray whitespaces, then use:
$ grep -E ' (202|404)([- ][[:digit:]]{3}[- ][[:digit:]]{4}|[[:digit:]]{7})[[:blank:]]*$' addresses Danny Brown 173 555 W. Avenue 2023831540 Martha Keith 567 322 S. Example 404 653 1200