Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

Re^2: Replacing non ascii in string

by IanD (Initiate)
on Jan 31, 2013 at 03:19 UTC ( #1016198=note: print w/replies, xml ) Need Help??

in reply to Re: Replacing non ascii in string
in thread Replacing non ascii in string

Thanks, this does basically work but for some reason it is returning 3 x ' which I can't see why but it is working.

$data_file =~ tr/‘’/'/;

Australia'''s '''Powder Capital'''

Even though the input is "Australia’s ‘Powder Capital’"

Replies are listed 'Best First'.
Re^3: Replacing non ascii in string
by Athanasius (Bishop) on Jan 31, 2013 at 04:10 UTC

    I can’t reproduce this problem. Can you provide a complete but minimal script that demonstrates the behaviour you are seeing?

    Please specify input and output precisely, and also show the output from perl -v.

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      I have solved by adding another line

      $url = ""; $data_string = get($url); $data_string =~ tr/‘’/'/; $data_string =~ tr/’/'/; $data_string =~ s/'''/'/g;

      For some reason when I view the source of that xml I see the apostrophes that I show here. However to check it I was doing the following:

      $file1 = "aafc.txt"; open (FILE,">$file1"); print FILE $data_string; close(FILE);

      Here I was seeing the string as
      ’Powder Capital’ instead of ‘Powder Capital’ as in the source of the xml. I don't understand why, clearly there has been some type of conversion of character types on reading the file. By making the substitution for ’ instead of ‘ it worked but then produced 3 ''' instead. So by doing that string replacement it worked. Seems bizarre to me but it works!

Re^3: Replacing non ascii in string
by Anonymous Monk on Jan 31, 2013 at 08:41 UTC

    Sounds like an encoding problem. Is your string utf-8? (Encode, $decoded_str = decode('utf-8', $str)) Did you use utf8 to get your literals parsed as such?

    (You see, those fancy apostrophes are represented as three bytes. If Perl thinks we're still in ascii-land (binary-land), it sees the transliteration as tr/\xe2\x80\x99/'/ -- effectively changing any of those three bytes to an apostrophe.)

      Bingo - that makes sense.

      Sorted now, thanks for your assistance.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1016198]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (6)
As of 2019-10-16 00:51 GMT
Find Nodes?
    Voting Booth?