Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re^3: XML Parser not well-formed

by mirod (Canon)
on Nov 02, 2004 at 17:45 UTC ( [id://404690]=note: print w/replies, xml ) Need Help??


in reply to Re^2: XML Parser not well-formed
in thread XML Parser not well-formed

I have actually got around the problem by processing the file manually beforing loading it up with XML::Parser, and removing the dodgy characters.

That's one way to do it. You could probably figure out in which encoding they are and replace them by the proper utf-8 character. My guess is that some of the text , like the DESCRIPTION is entered through either a web form or word processor, it shoul be possible to find out what encoding is used.

Replies are listed 'Best First'.
Re^4: XML Parser not well-formed
by ktingle (Sexton) on Nov 02, 2004 at 20:01 UTC
    That character is 0x92, UTF-8 only maps up to 0x7F as a single byte. If the document is representing that character with just one byte then its not UTF-8 and a broken XML instance. That character is represented with 2 bytes in UTF-8.

    Whenever I get confused about UTF-8 I use this reference;

    http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://404690]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (7)
As of 2024-03-28 19:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found