Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: Replacing non ascii in string

by Athanasius (Abbot)
on Jan 30, 2013 at 04:38 UTC ( #1015957=note: print w/ replies, xml ) Need Help??


in reply to Replacing non ascii in string

Hello IanD, and welcome to the Monastery!

Try this:

14:33 >perl -wE "my $s = 'Australia’s ‘Powder Capital’'; $s =~ tr/‘’/' +/; say $s;" Australia's 'Powder Capital' 14:33 >

See tr{}{} in Quote and Quote like Operators.

Update: Likewise,

14:42 >perl -wE "my $t = 'and ... xxx said “This is a fantastic start +to the season”'; $t =~ tr/“”/\"/; say $t;" and ... xxx said "This is a fantastic start to the season" 14:43 >

Or combined into one:

14:46 >perl -wE "my $s = qq[Australia’s ‘Powder Capital’\nand ... xxx +said “This is a fantastic start to the season”]; $s =~ tr/‘’“”/''\"\" +/; say $s;" Australia's 'Powder Capital' and ... xxx said "This is a fantastic start to the season" 14:49 >

Hope that helps,

Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,


Comment on Re: Replacing non ascii in string
Select or Download Code
Re^2: Replacing non ascii in string
by IanD (Initiate) on Jan 31, 2013 at 03:19 UTC

    Thanks, this does basically work but for some reason it is returning 3 x ' which I can't see why but it is working.

    ie
    $data_file =~ tr/‘’/'/;

    returns
    Australia'''s '''Powder Capital'''

    Even though the input is "Australia’s ‘Powder Capital’"

      I can’t reproduce this problem. Can you provide a complete but minimal script that demonstrates the behaviour you are seeing?

      Please specify input and output precisely, and also show the output from perl -v.

      Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

        I have solved by adding another line

        $url = "http://www.flightcentre.com.au/static/SkiHolidays.xml"; $data_string = get($url); $data_string =~ tr/‘’/'/; $data_string =~ tr/’/'/; $data_string =~ s/'''/'/g;

        For some reason when I view the source of that xml I see the apostrophes that I show here. However to check it I was doing the following:

        $file1 = "aafc.txt"; open (FILE,">$file1"); print FILE $data_string; close(FILE);

        Here I was seeing the string as
        ’Powder Capital’ instead of ‘Powder Capital’ as in the source of the xml. I don't understand why, clearly there has been some type of conversion of character types on reading the file. By making the substitution for ’ instead of ‘ it worked but then produced 3 ''' instead. So by doing that string replacement it worked. Seems bizarre to me but it works!

      Sounds like an encoding problem. Is your string utf-8? (Encode, $decoded_str = decode('utf-8', $str)) Did you use utf8 to get your literals parsed as such?

      (You see, those fancy apostrophes are represented as three bytes. If Perl thinks we're still in ascii-land (binary-land), it sees the transliteration as tr/\xe2\x80\x99/'/ -- effectively changing any of those three bytes to an apostrophe.)

        Bingo - that makes sense.

        Sorted now, thanks for your assistance.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1015957]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (10)
As of 2015-07-07 10:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (88 votes), past polls