I have a script taking mythtv recorded shows and encoding them in h264 using handbrake. the script is written in Perl.
My questions is how do I replace spaces and special characters with and underscore using perl?
The strings look something like this when output "Parks and Recreation - S05E01 - Ms. Knope Goes to Washington"
I would like it to look like this
Parks_and_Recreation_S05E01_Ms__Knope_Goes_to_Washington
Thanks in advance. I did do some googling but having found anything useful that I can implement.
Advertisement
Answer
Something like this might do it – notice that you might introduce duplicates if you transform strings like this.
my $input ="Parks and Recreation - S05E01 - Ms. Knope Goes to Washington"; $input =~ s/ - /_/g; # Replace all " - " with "_" $input =~ s/[^A-Za-z0-9]/_/g; # Replace all non-alphanumericals with "_" print $input;
This outputs:
Parks_and_Recreation_S05E01_Ms__Knope_Goes_to_Washington
Edit
Érics comment below is very relevant, here is a slightly better approach that replaces accented characters with unaccented before making substitutions:
use utf8; use Unicode::Normalize; my $input="La femme d'à côté"; my $result = NFD($input); # Unicode normalization Form D (NFD), canonical decomposition. $result !~ s/[^[:ascii:]]//g; # Remove all non-ascii. $result =~ s/ - /_/g; # Replace all " - " with "_" $result =~ s/[^A-Za-z0-9]/_/g; # Replace all non-alphanumericals with _ print $result;
This variant outputs:
La_femme_d_a_cote