Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re^4: XML::Simple parser error : Input is not proper UTF-8, indicate encoding

by daxim (Chaplain)
on Aug 10, 2012 at 09:49 UTC ( #986750=note: print w/replies, xml ) Need Help??


in reply to Re^3: XML::Simple parser error : Input is not proper UTF-8, indicate encoding
in thread XML::Simple parser error : Input is not proper UTF-8, indicate encoding

The OP, like the rest of the world, is using XML 1.0. XML 1.1 made too little progress and gained no adoption.

Edited to add: nah, I'm good.

$ rpm -qf `which xmllint` libxml2-2.7.8+git20110708-3.8.1.x86_64
  • Comment on Re^4: XML::Simple parser error : Input is not proper UTF-8, indicate encoding
  • Download Code

Replies are listed 'Best First'.
Re^5: XML::Simple parser error : Input is not proper UTF-8, indicate encoding
by BrowserUk (Pope) on Aug 10, 2012 at 10:08 UTC
    The OP, ..., is using XML 1.0.

    If we're being pedantic, the OPs problem is that he isn't using any form of XML!

    But if he decides to do so, he can make up his own mind about which standard he chooses, because -- despite what the "rest of the world" is using -- the tools support it (even without a header!):

    #! perl -slw use strict; use Data::Dump qw[ pp ]; use XML::Simple; my $xml = XMLin( \*DATA ); pp $xml; __DATA__ <EVENT> <CALLDETAILS> <STATIONID>01</STATIONID> <CALLSESSIONID>00000000020712130852059</CALLSESSIONID> <EXTENSIONNO>8143</EXTENSIONNO> <ZIVAHCHANNELID>172.16.39.88</ZIVAHCHANNELID> <SUBCHANNELID>0</SUBCHANNELID> <AGENTID>NULL</AGENTID> <CALLERID>&#xA0;jW&#xB7;h&#xAE;&#xF5;&#xBF;&#x8A;7a&#xB7;&#xD8 +;T&#xD9;^N</CALLERID> <CALLEEID>NULL</CALLEEID> <CALLTYPE>IN</CALLTYPE> <RINGCOUNT>1</RINGCOUNT> <CALLTERMSTATUS>NO_CTI_DATA</CALLTERMSTATUS> </CALLDETAILS> </EVENT>

    Produces:

    [14:58:34.75] C:\test>xmlent.pl { CALLDETAILS => { AGENTID => "NULL", CALLEEID => "NULL", CALLERID => pack("H*","a06a57b768aef5bf8a3761b7d854d95e4 +e"), CALLSESSIONID => "00000000020712130852059", CALLTERMSTATUS => "NO_CTI_DATA", CALLTYPE => "IN", EXTENSIONNO => 8143, RINGCOUNT => 1, STATIONID => "01", SUBCHANNELID => 0, ZIVAHCHANNELID => "172.16.39.88", }, }

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      nvivek arrived at this weird notation with angles and ^ followed by a letter by displaying it in vi. Your example program is wrong, it has a literal ^ and N as you neglected to substitute this notation for the original character. When the is corrected, the program predictably bombs out with:
      $ perl pm986755.pl Entity: line 9: parser error : PCDATA invalid Char value 14 <CALLERID>&#xA0;jW&#xB7;h&#xAE;&#xF5;&#xBF;&#x8A;7a&#xB7;&#xD8 +;T&#xD9;< + ^
      When the character is substituted with the character reference &#x0e;, it also bombs out:
      $ perl pm986755.pl Entity: line 9: parser error : xmlParseCharRef: invalid xmlChar value +14 <CALLERID>&#xA0;jW&#xB7;h&#xAE;&#xF5;&#xBF;&#x8A;7a&#xB7;&#xD8;T&# +xD9;&#x0e; + ^
      Upgrading the version in the PI to 1.1 does not help. XML-Simple respectively its underlying modules XML::Parser/expat and XML::LibXML/libxml2 cannot deal with XML 1.1!

      Your advice was flawed from the beginning, it simply cannot work in the general case. Whatever puts control characters there is apt to also put a chr(0) character. No matter whether plain character or character reference, it's illegal in all versions of XML.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://986750]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (6)
As of 2018-10-17 17:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    When I need money for a bigger acquisition, I usually ...














    Results (96 votes). Check out past polls.

    Notices?