<?xml version="1.0" encoding="windows-1252"?>
<node id="485882" title="Best RSS modules and techniques?" created="2005-08-23 07:36:27" updated="2005-08-23 03:36:27">
<type id="115">
perlquestion</type>
<author id="268515">
xdg</author>
<data>
<field name="doctext">
&lt;p&gt;I tried [Super Search] to see if this had been discussed, but most of the deluge of RSS questions seem to consist of "I'm trying to scrape RSS and I'm clueless, please help" so I gave up in frustration.  Apologies if I missed something obvious somewhere.&lt;/p&gt;
&lt;p&gt;I'm not clueless and I've been working with RSS for a while now (c.f [id://340820]), and I'm a little frustrated with various incompatibilties and breakage that I encounter dealing with people's feeds.  I'm currently using combinations of [mod://XML::RSS] and [mod://XML::RAI] -- though largely because that's what I started with.  So my questions are these:
&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;p&gt;What modules for RSS parsing have people found to be the most robust and stable (given unreliable, non-standard input feeds)?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;What modules best parse all the various feed standards?  (E.g. [mod://XML::RSS] docs are inconsistent about RSS 2.0 support)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;What modules best produce all the various feed standards?&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;What pre-processing have people found helpful in cleaning up non-standard feeds to keep [mod://XML::Parser] and the like from giving up on errors?&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;On that last point, I'll share my own helpful snippet.  I'm currently doing a rather hackish bit with a regex and [mod://HTML::Entities::Numbered] to fix up some of the broken encodings that I'm commonly finding on various feeds that was breaking [mod://XML::Parser].  YMMV.&lt;/p&gt;
&lt;code&gt;
$content =~ s/(&amp;#\d+);?/$1;/g;
$content = name2decimal_xml( $content );
&lt;/code&gt;
&lt;p&gt;Thanks,&lt;/p&gt;
&lt;div class="pmsig"&gt;&lt;div class="pmsig-268515"&gt;
&lt;p&gt;-xdg&lt;/p&gt;
&lt;p&gt;&lt;small&gt;&lt;i&gt;Code written by xdg and posted on PerlMonks is &lt;a href="http://creativecommons.org/licenses/publicdomain"&gt;public domain&lt;/a&gt;. It is provided &lt;b&gt;as is&lt;/b&gt; with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.&lt;/i&gt;&lt;/small&gt;&lt;/p&gt;
&lt;/div&gt;&lt;/div&gt;</field>
</data>
</node>
