good chemistry is complicated,
and a little bit messy -LW
XML::Parser and &entity;by dingus (Friar)
|on Nov 26, 2002 at 15:48 UTC||Need Help??|
dingus has asked for the
wisdom of the Perl Monks concerning the following question:
This is a follow up question to the one yesterday (XML Simple Charset Q?) about parsing data.
As well as some latin-1 accented characters, I also have some (valid) html entities such as ± ( ± ) and ≤ ( ≤ ). Unfortunately, no matter what I do, my XML::Parser always barfs on these entities. I've changed the charset I use in the file or passed as a raw parameter (or even when I'm playing with XML::Twig using keep_encoding) and changed the top level XML package (either XML::Simple or XML::Twig) to no avail.
My HTML::Entities correctly recognised and converts the encodings so that's presumably not the issue.
Is this an XML Parser bug (and if so or is it due to anold version of XML::Parser)? or am I just completely misinderstanding something? or? I would greatly prefer not to have to manually convert these entities before handing off to the parser and ideally I'd like them untouched since they don't need to be changed by any of the parsers.
Sample always failing code
Error: undefined entity at line 3, column 17, byte 129 at c:/Perl/site/lib/XML/Parser.pm line 168
Line 3 col 17 appears to be the ≥
Enter any 47-digit prime number to continue.