Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re^2: XML::Simple parser error : Input is not proper UTF-8, indicate encoding

by daxim (Curate)
on Aug 10, 2012 at 13:37 UTC ( [id://986747]=note: print w/replies, xml ) Need Help??


in reply to Re: XML::Simple parser error : Input is not proper UTF-8, indicate encoding
in thread XML::Simple parser error : Input is not proper UTF-8, indicate encoding

Character reference encoding does not help at all. The character itself is illegal, not its representation.
$ echo '<root>&#x0e;</root>' | xmllint - -:1: parser error : xmlParseCharRef: invalid xmlChar value 14 <root>&#x0e;</root> ^

Likewise CDATA is unsuitable:

$ perl -e'print "<root><![CDATA[\x{0e}]]></root>"' | xmllint - -:1: parser error : PCDATA invalid Char value 14 <root><![CDATA[]]></root> ^

Replies are listed 'Best First'.
Re^3: XML::Simple parser error : Input is not proper UTF-8, indicate encoding
by BrowserUk (Patriarch) on Aug 10, 2012 at 13:46 UTC

    Hm....maybe you need to update your copy of xmlint?

    "XML 1.1 extends the set of allowed characters to include all the above, plus the remaining characters in the range U+0001–U+001F. At the same time, however, it restricts the use of C0 and C1 control characters other than U+0009, U+000A, U+000D, and U+0085 by requiring them to be written in escaped form (for example U+0001 must be written as &#x01; or its equivalent). In the case of C1 characters, this restriction is a backwards incompatibility; it was introduced to allow common encoding errors to be detected."

    From what I can make out; having an encoding header is both obligatory, and required to make sense of how entities should be interpreted.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      The OP, like the rest of the world, is using XML 1.0. XML 1.1 made too little progress and gained no adoption.

      Edited to add: nah, I'm good.

      $ rpm -qf `which xmllint` libxml2-2.7.8+git20110708-3.8.1.x86_64
        The OP, ..., is using XML 1.0.

        If we're being pedantic, the OPs problem is that he isn't using any form of XML!

        But if he decides to do so, he can make up his own mind about which standard he chooses, because -- despite what the "rest of the world" is using -- the tools support it (even without a header!):

        #! perl -slw use strict; use Data::Dump qw[ pp ]; use XML::Simple; my $xml = XMLin( \*DATA ); pp $xml; __DATA__ <EVENT> <CALLDETAILS> <STATIONID>01</STATIONID> <CALLSESSIONID>00000000020712130852059</CALLSESSIONID> <EXTENSIONNO>8143</EXTENSIONNO> <ZIVAHCHANNELID>172.16.39.88</ZIVAHCHANNELID> <SUBCHANNELID>0</SUBCHANNELID> <AGENTID>NULL</AGENTID> <CALLERID>&#xA0;jW&#xB7;h&#xAE;&#xF5;&#xBF;&#x8A;7a&#xB7;&#xD8 +;T&#xD9;^N</CALLERID> <CALLEEID>NULL</CALLEEID> <CALLTYPE>IN</CALLTYPE> <RINGCOUNT>1</RINGCOUNT> <CALLTERMSTATUS>NO_CTI_DATA</CALLTERMSTATUS> </CALLDETAILS> </EVENT>

        Produces:

        [14:58:34.75] C:\test>xmlent.pl { CALLDETAILS => { AGENTID => "NULL", CALLEEID => "NULL", CALLERID => pack("H*","a06a57b768aef5bf8a3761b7d854d95e4 +e"), CALLSESSIONID => "00000000020712130852059", CALLTERMSTATUS => "NO_CTI_DATA", CALLTYPE => "IN", EXTENSIONNO => 8143, RINGCOUNT => 1, STATIONID => "01", SUBCHANNELID => 0, ZIVAHCHANNELID => "172.16.39.88", }, }

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://986747]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (6)
As of 2024-03-19 03:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found