not well formed (invalid token)

Anonymous Monk
Dear monks I have a problem when I want to access some files which are in non-Enlish letters. I have this:
<english id="337">Barings was Britain's oldest merchant bank.</english +> <french id="337">La Barings était la plus vieille banque daffaires dAn +gleterre.</french>
and I want to access it using XML:XPATH but i get this error:
not well-formed (invalid token) at line 4, column 28, byte 109:
It is becayse of the first character of "était". How it can be solved without touching the input data? Thanks.

Re: not well formed (invalid token)
by Corion (Pope) on Nov 04, 2009 at 10:37 UTC

    That looks like your data might be Unicode encoded as UTF-8. So you will have to decode the data prior to using XML::XPath on it, or tell XML::XPath to decode it properly.

      any clue how to decode it?

        Yes. See the Encode module which I already linked in my previous reply.

Re: not well formed (invalid token)
by Jenda (Abbot) on Nov 04, 2009 at 14:13 UTC

    Looks like the XML is missing the encoding specification so the parser thinks it's UTF8 and complains if it's not. Try to add

    <?xml version="1.0" encoding="ISO-8859-1"?>
    on top of the file. (Replace the ISO-8859-1 by whatever encoding the files use.) Until then the files are not correct XML.

