in reply to XML::RSS
Malformed XML is the bane of RSS. According to Mark Pilgrim about 10% of typical RSS feeds are malformed*, indeed the UK IT publication The Register has usable XML for only a few days in a given month.
You will find a wide range of problems that will cause XML::Parser the core of XML::RSS to explode:
- Data encoded in one format, but declared in another (or in default utf-8).
- Junk before the start XML declaration, the CMS Vignette tends to do this, and it's popular with big companies.
- Badly nested tags, the CMS is sloppy at non-well formness checking, so it comes out and goes into the RSS feed broken.
- Inproperly escaped ampersands and entities are a very common problem too.
In this node "How do I clean RSS feeds to make them usable?", Matts suggested his rssmirror, the guts of which are now included in both XML::RSS and XML::RSS::Tools.
I became so annoyed with bad XML in RSS feeds that I wrote XML::RSS::Tools to deal with the problems I found, which led to brian d foy taking over XML::RSS fixing a lot of it's problems, and with time designing a whole new version.
See also:
Good Luck!
--
ajt
|
---|
In Section
Seekers of Perl Wisdom