Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re^2: Replacing non ascii in string

by IanD (Initiate)
on Jan 31, 2013 at 03:19 UTC ( #1016198=note: print w/ replies, xml ) Need Help??


in reply to Re: Replacing non ascii in string
in thread Replacing non ascii in string

Thanks, this does basically work but for some reason it is returning 3 x ' which I can't see why but it is working.

ie
$data_file =~ tr/‘’/'/;

returns
Australia'''s '''Powder Capital'''

Even though the input is "Australia’s ‘Powder Capital’"


Comment on Re^2: Replacing non ascii in string
Download Code
Re^3: Replacing non ascii in string
by Athanasius (Abbot) on Jan 31, 2013 at 04:10 UTC

    I can’t reproduce this problem. Can you provide a complete but minimal script that demonstrates the behaviour you are seeing?

    Please specify input and output precisely, and also show the output from perl -v.

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      I have solved by adding another line

      $url = "http://www.flightcentre.com.au/static/SkiHolidays.xml"; $data_string = get($url); $data_string =~ tr/‘’/'/; $data_string =~ tr/’/'/; $data_string =~ s/'''/'/g;

      For some reason when I view the source of that xml I see the apostrophes that I show here. However to check it I was doing the following:

      $file1 = "aafc.txt"; open (FILE,">$file1"); print FILE $data_string; close(FILE);

      Here I was seeing the string as
      ’Powder Capital’ instead of ‘Powder Capital’ as in the source of the xml. I don't understand why, clearly there has been some type of conversion of character types on reading the file. By making the substitution for ’ instead of ‘ it worked but then produced 3 ''' instead. So by doing that string replacement it worked. Seems bizarre to me but it works!

Re^3: Replacing non ascii in string
by Anonymous Monk on Jan 31, 2013 at 08:41 UTC

    Sounds like an encoding problem. Is your string utf-8? (Encode, $decoded_str = decode('utf-8', $str)) Did you use utf8 to get your literals parsed as such?

    (You see, those fancy apostrophes are represented as three bytes. If Perl thinks we're still in ascii-land (binary-land), it sees the transliteration as tr/\xe2\x80\x99/'/ -- effectively changing any of those three bytes to an apostrophe.)

      Bingo - that makes sense.

      Sorted now, thanks for your assistance.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1016198]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (7)
As of 2015-07-05 18:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (67 votes), past polls