Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

XML::Simple throws decode error in encode.pm

by Anonymous Monk
on Aug 03, 2010 at 22:40 UTC ( [id://852764]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I am using XML::Simple to parse an XML string. The string is loaded from a signed UTF-8 file. When I call the XMLin($string) method to parse the document, I get this error:

Wide character in print at C:\Texts\Programs/SIFX.pm line 42.

Then Data::Dumper doesn't dump anything. It seems that it all just stops with that error. When I print the string, it looks just fine. The only non-ansi character I can find in it is this dash: —, which appears several times.

One of the containing lines is:

<LEX st="Kasulatan" id="tl" chrBrk="—" tSt="s11"/>

It really bugs me because this snippet works fine:

my $blah = $xml->XMLin('<LEX st="Kasulatan" id="tl" chrBrk="—" tSt="s1 +1"/>'); print Dumper($blah);
but it doesn't work when it is loaded from a file.

Replies are listed 'Best First'.
Re: XML::Simple throws decode error in encode.pm
by ikegami (Patriarch) on Aug 04, 2010 at 03:49 UTC

    Wide character in print at C:\Texts\Programs/SIFX.pm line 42.

    Your XML parser is working correctly (returning decoded text), but you forgot to encode the text before outputting it. An easy way of fixing that is add an encoding layer to the file handle.

    open(my $fh, '>:encoding(UTF-8)', ...) or die; print($fh $text);
    use open ':std', ':encoding(UTF-8)'; print($text);
    You can also do it manually.
    use Encode qw( encode ); print($fh encode("UTF-8", $text));
    use Encode qw( encode ); print(encode("UTF-8", $text));

    It really bugs me because this snippet works fine:

    Because Dumper encodes characters above 255 into escape sequences, hiding your bug (lack of encoding).

    >perl -wle"print chr(0x2660)" Wide character in print at -e line 1. <junk><junk><junk> >perl -MData::Dumper -we"print Dumper chr(0x2660)" $VAR1 = "\x{2660}";
Re: XML::Simple throws decode error in encode.pm
by almut (Canon) on Aug 03, 2010 at 23:15 UTC

    I think it would help if you could provide a complete, runnable example that allows to reproduce the error.

    I played around a bit, but the only thing remotely similar I managed to produce is

    Cannot decode string with wide characters at /usr/lib/perl/5.8/Encode. +pm line 166.

    when I do exactly what it complains about, i.e. pass XMLin() a string with already decoded (wide) characters...

    What is line 42 in SIFX.pm, is this the line with $xml->XMLin(...)?  Your subject says it "throws decode error in encode.pm", but you show a different error (or rather warning) message...(?)

      Shoot! I pasted the wrong error. I meant to say that the error printed is

      Cannot decode string with wide characters at C:/Perl/lib/Encode.pm line 174.

      Okay, I found a way to reduplicate it. Place the following lines in a text file and save it as utf8:

      <?xml version="1.0" encoding="utf-8" standalone="yes"?>

      <etax id="{e961ee2c-a029-489a-8bf4-3c2ecef7f019}" ettx="TL-Scriptures.ettx">

      <sifx>

      <LEX st="Kasulatan" id="tl" chrBrk="–—" tSt="s11"/>

      <LEX st="Pambungad" id="tl" chrBrk="–—" tSt="p11"/>

      <LEX st="Panimula" id="tl" chrBrk="–—" tSt="h11"/>

      </sifx>

      The following is the module I wrote where the error occurs:

      #!/usr/bin/perl -l package SIFX; use strict; use XML::Simple; use Data::Dumper; sub new(){#scalar file name optional my $class = shift; my $self = { etaxFile => '', SIFX => {}, }; bless $self, $class; load($self,shift) if @_ ==1; return $self; } sub load(){ return -1 if(@_ == 0); my ($self, $input) = @_; my $sifx; if((substr $input, -4, 4) eq '.txt'){#if input was a file name $self->{etaxFile} = $input; open my $etax, '<utf8', $input or print "Could not open etax f +ile at __LINE__"; my $text; while($text ne '<sifx>'){#until beginning of SIFX chomp($text = <$etax>); } $sifx= '<sifx>'; do{#until end of SIFX chomp($text = <$etax>); $sifx .= $text; }while($text !~ m#</sifx>#); close $etax; } else{ $sifx = $input; } my $xml = XML::Simple->new(); $self->{SIFX} = $xml->XMLin($sifx); print "SIFX hash is : "; print Dumper($self->{SIFX}); } 1;

      And you can test it with the following after changing the $testSifx variable to the path of the text file:

      my $xml = XML::Simple->new(); my $testSifx = "C:\\Users\\nate\\Desktop\\testSIFX.txt"; my $sifx = SIFX->new($testSifx); print Dumper($sifx);

      I apologize for the original, inadequate, inaccurate post.

        As already hinted at, XMLin() doesn't like already decoded input; it wants bytes/octets.  IOW, simply open the file as

        open my $etax, '<', $input or ... # ^ no :utf8
Re: XML::Simple throws decode error in encode.pm
by Khen1950fx (Canon) on Aug 04, 2010 at 01:56 UTC
    You are using an external proprietary program. I checked the ETAX Documentation, and I couldn't find any usage documentation. I did find chrBrk which is supposed to be a list of break characters. Again, no documentation for it. Here's the way that I tried it:
    #!/usr/bin/perl use strict; use Data::Dumper; use XML::Simple qw(:strict); my $xs = XML::Simple->new(); my $blah = $xs->XMLin('/root/Desktop/txt.xml', ForceArray => 1, KeyAttr => 1); binmode STDOUT, ':utf8'; print Dumper($blah);

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://852764]
Approved by AnomalousMonk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (7)
As of 2024-04-19 13:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found