Beefy Boxes and Bandwidth Generously Provided by pair Networks Frank
We don't bite newbies here... much
 
PerlMonks  

not well formed (invalid token)

by Anonymous Monk
on Nov 04, 2009 at 10:33 UTC ( #804892=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks I have a problem when I want to access some files which are in non-Enlish letters. I have this:
<english id="337">Barings was Britain's oldest merchant bank.</english +> <french id="337">La Barings était la plus vieille banque daffaires dAn +gleterre.</french>
and I want to access it using XML:XPATH but i get this error:
not well-formed (invalid token) at line 4, column 28, byte 109:
It is becayse of the first character of "était". How it can be solved without touching the input data? Thanks.

Comment on not well formed (invalid token)
Select or Download Code
Re: not well formed (invalid token)
by Corion (Pope) on Nov 04, 2009 at 10:37 UTC

    That looks like your data might be Unicode encoded as UTF-8. So you will have to decode the data prior to using XML::XPath on it, or tell XML::XPath to decode it properly.

      any clue how to decode it?

        Yes. See the Encode module which I already linked in my previous reply.

Re: not well formed (invalid token)
by Jenda (Abbot) on Nov 04, 2009 at 14:13 UTC

    Looks like the XML is missing the encoding specification so the parser thinks it's UTF8 and complains if it's not. Try to add

    <?xml version="1.0" encoding="ISO-8859-1"?>
    on top of the file. (Replace the ISO-8859-1 by whatever encoding the files use.) Until then the files are not correct XML.

    Jenda
    Enoch was right!
    Enjoy the last years of Rome.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://804892]
Approved by moritz
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (8)
As of 2014-04-18 05:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (461 votes), past polls