Skip to content
Advertisement

Recursively grep unique pattern in different files

Sorry title is not very clear. So let’s say I’m grepping recursively for urls like this:

grep -ERo '(http|https)://[^/"]+' /folder

and in folder there are several files containing the same url. My goal is to output only once this url. I tried to pipe the grep to | uniq or sort -u but that doesn’t help

example result:

/www/tmpl/button.tpl.php:http://www.w3.org
/www/tmpl/header.tpl.php:http://www.w3.org
/www/tmpl/main.tpl.php:http://www.w3.org
/www/tmpl/master.tpl.php:http://www.w3.org
/www/tmpl/progress.tpl.php:http://www.w3.org

Advertisement

Answer

If the structure of the output is always: /some/path/to/file.php:http://www.someurl.org

you can use the command cut :

cut -d ':' -f 2- should work. Basically, it cuts each line into fields separated by a delimiter (here “:”) and you select the 2nd and following fields (-f 2-)

After that, you can use uniq to filter.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement