I want to insert multiple line from file1 marked with pattern into a file2 using shell.
The pattern is 10 numbers, always different input exmple: “2016854218”
file1 example (input):
[...] <a class="none" data-container="#fr_5854841" href="https://example.com/profiles/2016854218"></a> <div class="new_cl"> <img src="2016854218_medium.jpg"> </div> <div class="blocker">Novaa<br> <span class="friend_small_text"> [...]
file2 example (output):
2016854218 2016859711 2017076181
Advertisement
Answer
EDIT: Since OP wants to have http
link’s complete value till all the digits s adding this solution now too.
awk --re-interval 'match($0,/https:.*[0-9]{10}/){print substr($0,RSTART,RLENGTH)}' Input_file
Could you please first if you have control M characters in your Input_file by doing cat -v Input_file
if yes then run dos2unix
utility in case you have it. In case you don’t have it use:
tr -d 'r' < Input_file > temp_file && mv temp_file Input_file
but above will remove all control M characters, so to remove control Ms on last of the line(in case) use:
awk '{sub(/r$/,"")}1' Input_file > temp_file && mv temp_file Input_file
Now once your control Ms are not there on Input_file then you could use following:
awk --re-interval 'match($0,/[0-9]{10}/){print substr($0,RSTART,RLENGTH)}' Input_file > Output_file
You could remove --re-interval
in case you have newer version of GNU awk
with you.