Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

not well formed (invalid token)

by Anonymous Monk
on Nov 04, 2009 at 10:33 UTC ( #804892=perlquestion: print w/replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks I have a problem when I want to access some files which are in non-Enlish letters. I have this:
<english id="337">Barings was Britain's oldest merchant bank.</english +> <french id="337">La Barings était la plus vieille banque daffaires dAn +gleterre.</french>
and I want to access it using XML:XPATH but i get this error:
not well-formed (invalid token) at line 4, column 28, byte 109:
It is becayse of the first character of "était". How it can be solved without touching the input data? Thanks.

Replies are listed 'Best First'.
Re: not well formed (invalid token)
by Corion (Pope) on Nov 04, 2009 at 10:37 UTC

    That looks like your data might be Unicode encoded as UTF-8. So you will have to decode the data prior to using XML::XPath on it, or tell XML::XPath to decode it properly.

      any clue how to decode it?

        Yes. See the Encode module which I already linked in my previous reply.

Re: not well formed (invalid token)
by Jenda (Abbot) on Nov 04, 2009 at 14:13 UTC

    Looks like the XML is missing the encoding specification so the parser thinks it's UTF8 and complains if it's not. Try to add

    <?xml version="1.0" encoding="ISO-8859-1"?>
    on top of the file. (Replace the ISO-8859-1 by whatever encoding the files use.) Until then the files are not correct XML.

    Enoch was right!
    Enjoy the last years of Rome.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://804892]
Approved by moritz
and nobody stirs...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (2)
As of 2018-05-27 06:28 GMT
Find Nodes?
    Voting Booth?