Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

utf8 && XML::Simple

by zakzebrowski (Curate)
on Feb 01, 2005 at 17:07 UTC ( #426964=perlquestion: print w/ replies, xml ) Need Help??
zakzebrowski has asked for the wisdom of the Perl Monks concerning the following question:

Hi All,
Can anyone explain why this fails on perl 5.8.3? I tried various utf8 tricks, but I can't get it to work...
Update: johnnywang++ and borisz++. Write file out as utf8, but read in a file using XML::Simple's XMLin interface!
use XML::Simple qw(:strict); use Encode; # use open 'utf8'; # Can't get this to work - should open all files as + utf8... use Data::Dumper; my $val; $val->{utfchar} = "\x{10a0}"; my $xml = XMLout($val,KeyAttr=>{item=>'name}); open (OUT,">out.xml"); print OUT $xml; close OUT; # Yes, I could use a different slurp funciton... my $readin=""; open (IN,"<out.xml"); while (<IN>){ $readin = $readin . $_; } close IN; my $result = XMLin($readin,KeyAttr=>{item=>'name'},ForceArray=>1); if ($result->{utfchar} eq "\x{10a0}"){ print "Wohoo!\n"; } else { print "Doh!\n"; }

Zak - the office

Comment on utf8 && XML::Simple
Download Code
Re: utf8 && XML::Simple
by borisz (Canon) on Feb 01, 2005 at 17:14 UTC
    What about use open ':utf8';?
    Wohoo! for me.
      Thanks for replying. In my (much more complicated real life) file, I get "Cannot decode string with wide characters at /usr/../ line 184."

      Zak - the office
        most likely your input data is arady in utf8 and you want to convert it a second time to utf8. for example:
        use Encode; my $str = "hi"; # hi is a notmal string. $str .= chr(0x1234); # str is now a utf8 string Encode::decode_utf8($str, 1); # here you get the error.
        Propably you exchange encode and decode. Or the decode function call is not needed in your case.
Re: utf8 && XML::Simple
by johnnywang (Priest) on Feb 01, 2005 at 18:11 UTC
    Just want to point out (this is not what you're asking) that XMLin can also take a file name as first argument. So instead of:
    # Yes, I could use a different slurp funciton... my $readin=""; open (IN,"<out.xml"); while (<IN>){ $readin = $readin . $_; } close IN; my $result = XMLin($readin,KeyAttr=>{item=>'name'},ForceArray=>1);
    you can just say:
    my $result = XMLin("out.xml",KeyAttr=>{item=>'name'},ForceArray=>1);
      ++ ++ !! This technique works. It looks like XML::Simple will *automatically* read in a file as utf8. So, you must explicilty write a file as utf8, and just use the XMLin method explicitly to read the file... Thanks! Zak

      Zak - the office
      I have the same problem as zakrebrowski. When I use his example I still do not get the characters right:
      #!/usr/bin/perl use XML::Simple; use Data::Dumper; use Encode; my $content = "<?xml version=\"1.0\" encoding=\"UTF-8\" ?>\n"; $content .= "<tag>\x{c3}\x{bb}</tag>\n"; print "input:\n$content\n"; my $xml = new XML::Simple; my $data = $xml->XMLin($content, KeepRoot => 1); encode_utf8($data->{'tag'}); print "data: ".$data->{'tag'}."\n"; print Dumper $data;
      input: <?xml version="1.0" encoding="UTF-8" ?> <tag></tag> data: $VAR1 = { 'tag' => "\x{fb}" };

      My real life code tries to parse an xml with xml::Simple and stores the data in a mysql-database. The database has the same encoding problems as above.

      I am looking at this sample code for days now with no idea where to go on ... Any help is appreciated!

        The code appears correct, because 00FB is Latin Small Letter U With Circumflex. So the next steps would be to check how the data gets stored in MySQL, how you retrieve the data and how you then display the data.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://426964]
Approved by bart
Front-paged by kutsu
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (13)
As of 2015-07-01 19:51 GMT
Find Nodes?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...

    Results (19 votes), past polls