No such thing as a small change | |
PerlMonks |
XML::RSSby alexg (Beadle) |
on Mar 21, 2003 at 10:58 UTC ( [id://244827]=perlquestion: print w/replies, xml ) | Need Help?? |
alexg has asked for the wisdom of the Perl Monks concerning the following question:
Hi, I'm writing a quick and dirty RSS feed aggregator and I'm getting very frustrated with one tiny problem. Occasionally the RSS XML docs contain characters which cause XML::Parser to choke.
The XML::Parser error message generated is: when I look at byte 530 it turns out to be the 'é' in Nescafé. Other exotic characters also cause XML::Parser to stop dead. I've tried the nice_string function from the Unicode man page:
but this enrages XML::Parser even further and it fails and the first end-of-line character. I'm using LWP::Simple to grab the XML so my script essentially looks like this:
Can anyone recomend a module/function that will reliably sanitise the string that get() returns, in an encoding suitable for XML::Parser? PS I've looked at the 'encoding' option for XML::Parser but it doesn't seem to change the result :(
Back to
Seekers of Perl Wisdom
|
|