Skip to content
Advertisement

How can I replace special character in a string with underscores using a Perl script?

I have a script taking mythtv recorded shows and encoding them in h264 using handbrake. the script is written in Perl.

My questions is how do I replace spaces and special characters with and underscore using perl?

The strings look something like this when output "Parks and Recreation - S05E01 - Ms. Knope Goes to Washington"

I would like it to look like this

Parks_and_Recreation_S05E01_Ms__Knope_Goes_to_Washington

Thanks in advance. I did do some googling but having found anything useful that I can implement.

Advertisement

Answer

Something like this might do it – notice that you might introduce duplicates if you transform strings like this.

my $input ="Parks and Recreation - S05E01 - Ms. Knope Goes to Washington";

$input =~ s/ - /_/g; # Replace all " - " with "_"
$input =~ s/[^A-Za-z0-9]/_/g; # Replace all non-alphanumericals with "_"

print $input;

This outputs:

Parks_and_Recreation_S05E01_Ms__Knope_Goes_to_Washington

Edit

Érics comment below is very relevant, here is a slightly better approach that replaces accented characters with unaccented before making substitutions:

use utf8;
use Unicode::Normalize;

my $input="La femme d'à côté";
my $result = NFD($input); # Unicode normalization Form D (NFD), canonical decomposition.
$result !~ s/[^[:ascii:]]//g; # Remove all non-ascii.
$result =~ s/ - /_/g; # Replace all " - " with "_"
$result =~ s/[^A-Za-z0-9]/_/g; # Replace all non-alphanumericals with _
print $result;

This variant outputs:

La_femme_d_a_cote

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement