Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: XML::RSS

by ajt (Prior)
on Mar 24, 2003 at 21:40 UTC ( #245550=note: print w/ replies, xml ) Need Help??


in reply to XML::RSS

alexg,

Malformed XML is the bane of RSS. According to Mark Pilgrim about 10% of typical RSS feeds are malformed*, indeed the UK IT publication The Register has usable XML for only a few days in a given month.

You will find a wide range of problems that will cause XML::Parser the core of XML::RSS to explode:

  • Data encoded in one format, but declared in another (or in default utf-8).
  • Junk before the start XML declaration, the CMS Vignette tends to do this, and it's popular with big companies.
  • Badly nested tags, the CMS is sloppy at non-well formness checking, so it comes out and goes into the RSS feed broken.
  • Inproperly escaped ampersands and entities are a very common problem too.

In this node "How do I clean RSS feeds to make them usable?", Matts suggested his rssmirror, the guts of which are now included in both XML::RSS and XML::RSS::Tools.

I became so annoyed with bad XML in RSS feeds that I wrote XML::RSS::Tools to deal with the problems I found, which led to brian d foy taking over XML::RSS fixing a lot of it's problems, and with time designing a whole new version.

See also:

Good Luck!

* Parsing RSS At All Costs


--
ajt


Comment on Re: XML::RSS

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://245550]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (16)
As of 2015-07-06 19:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (81 votes), past polls