Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: XML::Simple parser error : Input is not proper UTF-8, indicate encoding

by BrowserUk (Patriarch)
on Aug 10, 2012 at 13:32 UTC ( [id://986745]=note: print w/replies, xml ) Need Help??


in reply to XML::Simple parser error : Input is not proper UTF-8, indicate encoding

Your data isn't valid XML. The only control characters characters allowed are tab, cr and lf.

You'd need to wrap your callerid data in CDATA tags; or encode them in entity format: Eg.   before an XML parser will process it.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?

Replies are listed 'Best First'.
Re^2: XML::Simple parser error : Input is not proper UTF-8, indicate encoding
by daxim (Curate) on Aug 10, 2012 at 13:37 UTC
    Character reference encoding does not help at all. The character itself is illegal, not its representation.
    $ echo '<root>&#x0e;</root>' | xmllint - -:1: parser error : xmlParseCharRef: invalid xmlChar value 14 <root>&#x0e;</root> ^

    Likewise CDATA is unsuitable:

    $ perl -e'print "<root><![CDATA[\x{0e}]]></root>"' | xmllint - -:1: parser error : PCDATA invalid Char value 14 <root><![CDATA[]]></root> ^

      Hm....maybe you need to update your copy of xmlint?

      "XML 1.1 extends the set of allowed characters to include all the above, plus the remaining characters in the range U+0001–U+001F. At the same time, however, it restricts the use of C0 and C1 control characters other than U+0009, U+000A, U+000D, and U+0085 by requiring them to be written in escaped form (for example U+0001 must be written as &#x01; or its equivalent). In the case of C1 characters, this restriction is a backwards incompatibility; it was introduced to allow common encoding errors to be detected."

      From what I can make out; having an encoding header is both obligatory, and required to make sense of how entities should be interpreted.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?

        The OP, like the rest of the world, is using XML 1.0. XML 1.1 made too little progress and gained no adoption.

        Edited to add: nah, I'm good.

        $ rpm -qf `which xmllint` libxml2-2.7.8+git20110708-3.8.1.x86_64

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://986745]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (6)
As of 2024-03-19 11:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found