Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

utf8 && XML::Simple

by zakzebrowski (Curate)
on Feb 01, 2005 at 17:07 UTC ( #426964=perlquestion: print w/ replies, xml ) Need Help??
zakzebrowski has asked for the wisdom of the Perl Monks concerning the following question:

Hi All,
Can anyone explain why this fails on perl 5.8.3? I tried various utf8 tricks, but I can't get it to work...
Thanks.
Zak
Update: johnnywang++ and borisz++. Write file out as utf8, but read in a file using XML::Simple's XMLin interface!
use XML::Simple qw(:strict); use Encode; # use open 'utf8'; # Can't get this to work - should open all files as + utf8... use Data::Dumper; my $val; $val->{utfchar} = "\x{10a0}"; my $xml = XMLout($val,KeyAttr=>{item=>'name}); open (OUT,">out.xml"); print OUT $xml; close OUT; # Yes, I could use a different slurp funciton... my $readin=""; open (IN,"<out.xml"); while (<IN>){ $readin = $readin . $_; } close IN; my $result = XMLin($readin,KeyAttr=>{item=>'name'},ForceArray=>1); if ($result->{utfchar} eq "\x{10a0}"){ print "Wohoo!\n"; } else { print "Doh!\n"; }


----
Zak - the office

Comment on utf8 && XML::Simple
Download Code
Re: utf8 && XML::Simple
by borisz (Canon) on Feb 01, 2005 at 17:14 UTC
    What about use open ':utf8';?
    Wohoo! for me.
    Boris
      Thanks for replying. In my (much more complicated real life) file, I get "Cannot decode string with wide characters at /usr/../Encode.pm line 184."


      ----
      Zak - the office
        most likely your input data is arady in utf8 and you want to convert it a second time to utf8. for example:
        use Encode; my $str = "hi"; # hi is a notmal string. $str .= chr(0x1234); # str is now a utf8 string Encode::decode_utf8($str, 1); # here you get the error.
        Propably you exchange encode and decode. Or the decode function call is not needed in your case.
        Boris
Re: utf8 && XML::Simple
by johnnywang (Priest) on Feb 01, 2005 at 18:11 UTC
    Just want to point out (this is not what you're asking) that XMLin can also take a file name as first argument. So instead of:
    # Yes, I could use a different slurp funciton... my $readin=""; open (IN,"<out.xml"); while (<IN>){ $readin = $readin . $_; } close IN; my $result = XMLin($readin,KeyAttr=>{item=>'name'},ForceArray=>1);
    you can just say:
    my $result = XMLin("out.xml",KeyAttr=>{item=>'name'},ForceArray=>1);
      ++ ++ !! This technique works. It looks like XML::Simple will *automatically* read in a file as utf8. So, you must explicilty write a file as utf8, and just use the XMLin method explicitly to read the file... Thanks! Zak


      ----
      Zak - the office
      I have the same problem as zakrebrowski. When I use his example I still do not get the characters right:
      #!/usr/bin/perl use XML::Simple; use Data::Dumper; use Encode; my $content = "<?xml version=\"1.0\" encoding=\"UTF-8\" ?>\n"; $content .= "<tag>\x{c3}\x{bb}</tag>\n"; print "input:\n$content\n"; my $xml = new XML::Simple; my $data = $xml->XMLin($content, KeepRoot => 1); encode_utf8($data->{'tag'}); print "data: ".$data->{'tag'}."\n"; print Dumper $data;
      returns:
      input: <?xml version="1.0" encoding="UTF-8" ?> <tag></tag> data: $VAR1 = { 'tag' => "\x{fb}" };

      My real life code tries to parse an xml with xml::Simple and stores the data in a mysql-database. The database has the same encoding problems as above.

      I am looking at this sample code for days now with no idea where to go on ... Any help is appreciated!

        The code appears correct, because 00FB is Latin Small Letter U With Circumflex. So the next steps would be to check how the data gets stored in MySQL, how you retrieve the data and how you then display the data.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://426964]
Approved by bart
Front-paged by kutsu
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (7)
As of 2015-06-02 06:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    What kind of chocolate gives you the most pleasure?















    Results (73 votes), past polls