Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re^5: XML::Simple parser error : Input is not proper UTF-8, indicate encoding

by BrowserUk (Pope)
on Aug 10, 2012 at 14:08 UTC ( #986755=note: print w/replies, xml ) Need Help??


in reply to Re^4: XML::Simple parser error : Input is not proper UTF-8, indicate encoding
in thread XML::Simple parser error : Input is not proper UTF-8, indicate encoding

The OP, ..., is using XML 1.0.

If we're being pedantic, the OPs problem is that he isn't using any form of XML!

But if he decides to do so, he can make up his own mind about which standard he chooses, because -- despite what the "rest of the world" is using -- the tools support it (even without a header!):

#! perl -slw use strict; use Data::Dump qw[ pp ]; use XML::Simple; my $xml = XMLin( \*DATA ); pp $xml; __DATA__ <EVENT> <CALLDETAILS> <STATIONID>01</STATIONID> <CALLSESSIONID>00000000020712130852059</CALLSESSIONID> <EXTENSIONNO>8143</EXTENSIONNO> <ZIVAHCHANNELID>172.16.39.88</ZIVAHCHANNELID> <SUBCHANNELID>0</SUBCHANNELID> <AGENTID>NULL</AGENTID> <CALLERID>&#xA0;jW&#xB7;h&#xAE;&#xF5;&#xBF;&#x8A;7a&#xB7;&#xD8 +;T&#xD9;^N</CALLERID> <CALLEEID>NULL</CALLEEID> <CALLTYPE>IN</CALLTYPE> <RINGCOUNT>1</RINGCOUNT> <CALLTERMSTATUS>NO_CTI_DATA</CALLTERMSTATUS> </CALLDETAILS> </EVENT>

Produces:

[14:58:34.75] C:\test>xmlent.pl { CALLDETAILS => { AGENTID => "NULL", CALLEEID => "NULL", CALLERID => pack("H*","a06a57b768aef5bf8a3761b7d854d95e4 +e"), CALLSESSIONID => "00000000020712130852059", CALLTERMSTATUS => "NO_CTI_DATA", CALLTYPE => "IN", EXTENSIONNO => 8143, RINGCOUNT => 1, STATIONID => "01", SUBCHANNELID => 0, ZIVAHCHANNELID => "172.16.39.88", }, }

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?

Replies are listed 'Best First'.
Re^6: XML::Simple parser error : Input is not proper UTF-8, indicate encoding
by daxim (Curate) on Aug 10, 2012 at 17:18 UTC
    nvivek arrived at this weird notation with angles and ^ followed by a letter by displaying it in vi. Your example program is wrong, it has a literal ^ and N as you neglected to substitute this notation for the original character. When the is corrected, the program predictably bombs out with:
    $ perl pm986755.pl Entity: line 9: parser error : PCDATA invalid Char value 14 <CALLERID>&#xA0;jW&#xB7;h&#xAE;&#xF5;&#xBF;&#x8A;7a&#xB7;&#xD8 +;T&#xD9;< + ^
    When the character is substituted with the character reference &#x0e;, it also bombs out:
    $ perl pm986755.pl Entity: line 9: parser error : xmlParseCharRef: invalid xmlChar value +14 <CALLERID>&#xA0;jW&#xB7;h&#xAE;&#xF5;&#xBF;&#x8A;7a&#xB7;&#xD8;T&# +xD9;&#x0e; + ^
    Upgrading the version in the PI to 1.1 does not help. XML-Simple respectively its underlying modules XML::Parser/expat and XML::LibXML/libxml2 cannot deal with XML 1.1!

    Your advice was flawed from the beginning, it simply cannot work in the general case. Whatever puts control characters there is apt to also put a chr(0) character. No matter whether plain character or character reference, it's illegal in all versions of XML.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://986755]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (2)
As of 2020-07-02 05:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?