Pathologically Eclectic Rubbish Lister | |
PerlMonks |
XML invalid tokenby Ea (Chaplain) |
on Nov 15, 2011 at 15:13 UTC ( [id://938194]=perlquestion: print w/replies, xml ) | Need Help?? |
Ea has asked for the wisdom of the Perl Monks concerning the following question:
I'm parsing an XML document that has an acute accent acting as a right quote. It's char(180) (aka U+00B4) and the document encoding is UTF-8. When I run XML::Parser over it (or even the xml_pp tool), I get a "not well-formed (invalid token)" error.
I've naively tried adding use utf8; to the script, but I still get the error. I believe I could just tr/// that bad boy into something less problematic, but I was wondering if there was a lazier way, like a setting in XML::Parser that I can add to the handlers? For the curious, I'm getting my output from LaTeXML, a set perl tools for converting LaTeX to XML. There might be some scope to process the output before I parse the XML, but I suspect that it'll look a. thanks,
perl -e 'print qq(Just another Perl Hacker\n)' # where's the irony switch?
Back to
Seekers of Perl Wisdom
|
|