http://www.perlmonks.org?node_id=1015957


in reply to Replacing non ascii in string

Hello IanD, and welcome to the Monastery!

Try this:

14:33 >perl -wE "my $s = 'Australia’s ‘Powder Capital’'; $s =~ tr/‘’/' +/; say $s;" Australia's 'Powder Capital' 14:33 >

See tr{}{} in Quote and Quote like Operators.

Update: Likewise,

14:42 >perl -wE "my $t = 'and ... xxx said “This is a fantastic start +to the season”'; $t =~ tr/“”/\"/; say $t;" and ... xxx said "This is a fantastic start to the season" 14:43 >

Or combined into one:

14:46 >perl -wE "my $s = qq[Australia’s ‘Powder Capital’\nand ... xxx +said “This is a fantastic start to the season”]; $s =~ tr/‘’“”/''\"\" +/; say $s;" Australia's 'Powder Capital' and ... xxx said "This is a fantastic start to the season" 14:49 >

Hope that helps,

Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Replies are listed 'Best First'.
Re^2: Replacing non ascii in string
by IanD (Initiate) on Jan 31, 2013 at 03:19 UTC

    Thanks, this does basically work but for some reason it is returning 3 x ' which I can't see why but it is working.

    ie
    $data_file =~ tr/‘’/'/;

    returns
    Australia'''s '''Powder Capital'''

    Even though the input is "Australia’s ‘Powder Capital’"

      I can’t reproduce this problem. Can you provide a complete but minimal script that demonstrates the behaviour you are seeing?

      Please specify input and output precisely, and also show the output from perl -v.

      Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

        I have solved by adding another line

        $url = "http://www.flightcentre.com.au/static/SkiHolidays.xml"; $data_string = get($url); $data_string =~ tr/‘’/'/; $data_string =~ tr/’/'/; $data_string =~ s/'''/'/g;

        For some reason when I view the source of that xml I see the apostrophes that I show here. However to check it I was doing the following:

        $file1 = "aafc.txt"; open (FILE,">$file1"); print FILE $data_string; close(FILE);

        Here I was seeing the string as
        ’Powder Capital’ instead of ‘Powder Capital’ as in the source of the xml. I don't understand why, clearly there has been some type of conversion of character types on reading the file. By making the substitution for ’ instead of ‘ it worked but then produced 3 ''' instead. So by doing that string replacement it worked. Seems bizarre to me but it works!

      Sounds like an encoding problem. Is your string utf-8? (Encode, $decoded_str = decode('utf-8', $str)) Did you use utf8 to get your literals parsed as such?

      (You see, those fancy apostrophes are represented as three bytes. If Perl thinks we're still in ascii-land (binary-land), it sees the transliteration as tr/\xe2\x80\x99/'/ -- effectively changing any of those three bytes to an apostrophe.)

        Bingo - that makes sense.

        Sorted now, thanks for your assistance.