Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Perl RSS aggregator

by Tommy (Chaplain)
on Dec 08, 2004 at 19:16 UTC ( #413299=snippet: print w/ replies, xml ) Need Help??

Description:

Install the required modules for this RSS aggregator, then customize the MySQL database table it uses to look like this, otherwise you won't be able to actually store the RSS feeds you poll, and the code will fail:

+---------+--------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +---------+--------------+------+-----+---------+-------+ | feedurl | varchar(255) | | PRI | | | | nextup | int(11) | YES | | NULL | | | lastmod | varchar(40) | YES | | NULL | | | etag | varchar(250) | YES | | NULL | | | content | longtext | YES | | NULL | | +---------+--------------+------+-----+---------+-------+
Then set up the code to run as a cron job every hour.
#!/usr/bin/perl -w
use strict; use warnings;
# list your feeds below in the format shown; leave the rest of the fil
+e alone
my(@feeds) = (
   # feedurl                                        # forced refresh i
+n seconds
   ['http://rss.news.yahoo.com/rss/world',          60 * 60],      # h
+ourly
   ['http://www.microsite.reuters.com/rss/topNews', 60 * 60],      # h
+ourly
   ['http://feeds.feedburner.com/TommysNewsAndWorldReport', 60 * 60], 
+# hourly
   ['http://perlmonks.org/index.pl?node_id=30175&xmlstyle=rss', 60 * 6
+0], # hourly
   ['http://www.wordsmith.org/awad/rss1.xml',       60 * 60 * 24], # d
+aily
   ['http://xml.education.yahoo.com/rss/wotd/',     60 * 60 * 24], # d
+aily
   ['http://netrn.net/spywareblog/feed/rss2/',      60 * 60 * 24], # d
+aily
);

# globals
use vars qw( $dbh );

# libraries
use XML::RSS::TimingBotDBI;
use DBI;

# connect to DB
$dbh = DBI->connect(
   q[DBI:mysql:] .
   qq[database=myrssfeeds;] .
   qq[host=localhost;] .
   qq[port=3306],
   '[PUT YOUR USERNAME HERE]',   # MySQL DB username
   '[PUT YOUR PASSWORD HERE]',  # ...and password
   { 'RaiseError' => 0, 'AutoCommit' => 1 }
) or die qq[Aborting!  Failed to connect to database: $DBI::errstr];

foreach (@feeds) {
   my($feed) = $_;

   # check for an entry in the db corresponding to this feed
   my($row) = ( $dbh->selectrow_array(<<__SQL__, undef, $feed->[0]) )[
+0];
SELECT feedurl FROM feeds WHERE feedurl = ?
__SQL__

   unless ($row) { # auto-create db entry for this feed if it doesn't 
+exist
      $dbh->do(q[INSERT INTO feeds SET feedurl = ?], undef, $feed->[0]
+)
   }

   # grab the feed and thbbbtave it
   getfeed(@$_);
}

sub getfeed {
   my($rssurl,$maxage) = @_;

   # initialize the RSS bot!
   my($rssbot) = XML::RSS::TimingBotDBI->new;
   $rssbot->rssagent_dbh($dbh);
   $rssbot->rssagent_table('feeds');
   $rssbot->maxAge($maxage) if $maxage;
   $rssbot->maxAge($maxage) if $maxage;

   # grab the RSS feed
   my($response) = $rssbot->get($rssurl);

   # check response code
   if ($response->code == 200) {
      # save RSS feed content if it was successfully retrieved
      my($sth) =
         $dbh->prepare(q[UPDATE feeds SET content = ? WHERE feedurl = 
+?])
         or die q[RSSBOT: Aborting!  Problem encountered with MySQL: ]
         . $DBI::errstr;

      $sth->execute($response->content, $rssurl)
         or die q[RSSBOT: Aborting!  Problem encountered with MySQL: ]
         . $DBI::errstr;

      $sth->finish();
      print qq[RSSBOT: RSS feed "$rssurl" freshly retrieved to databas
+e\n]
   }
   elsif ($response->code == 304) {
      print qq[RSSBOT: feed "$rssurl" already up to date.  No need to 
+refresh\n]
   }
   else {
      # report the error and abort if there was a problem getting the 
+feed
      die qq[RSSBOT: Aborting!  Problem accessing feed "$rssurl": ]
      . $response->status_line
   }

   # have the rss bot save it's RSS lookup history...
   # $rssbot->commit; #<-- only necessary if MySQL auto-commit is off

   # ...or die trying
   die q[RSSBOT: Aborting!  Problem encountered while working with MyS
+QL: ]
     . $DBI::errstr if $DBI::errstr;

   # update OK
   print qq[RSSBOT: update OK at ${\ scalar localtime }\n];
}

# scram
exit;

# disconnect if not already disconnected
END { $dbh->disconnect() if defined $dbh }
Comment on Perl RSS aggregator
Download Code
•Re: Perl RSS aggregator
by merlyn (Sage) on Dec 08, 2004 at 22:34 UTC
    Why MySQL and not PostgreSQL?

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.


    update: To those that downvoted this posting, just be aware that you'll be seeing more just like this one. PostgreSQL is horribly under-known, and I'm going to make sure that every casual mention of MySQL has PostgreSQL mentioned somewhere in the thread. MySQL has had its day. PostgreSQL is now leading in terms of functionality and support, not to mention having a much better license structure.

      Hi merlyn!

      Short answer: because PG just doesn't cut it for me. It's just so clunky imo. I don't like it. Maybe I will someday. But today doesn't seem to be that day.

      --
      Tommy Butler, a.k.a. TOMMY
      

        In that case, why MySQL and not SQLite :)

        --
        <http://www.dave.org.uk>

        "The first rule of Perl club is you do not talk about Perl club."
        -- Chip Salzenberg

      I guess you've not heard of MariaDB then? (Which does better than both). PostgreSQL is "under-known" for a reason. Stuffing it in people's faces only makes it annoying, and naturally more people will avoid it. What your suggesting, and doing "...I'm going to make sure that every casual mention of MySQL has PostgreSQL mentioned..."), is a big disservice to the PostgreSQL community, you're hurting us. So please stop.

        ... So please stop.

        Pay attention to the dates, I doubt he's kept it up since 2004

Re: Perl RSS aggregator
by Anonymous Monk on Jan 07, 2009 at 00:19 UTC
    This is probably flame bait, but SQLite is not heavily used (in my experience). It is being used by Mac OS X native applications (e.g. AddressBook etc.). On the other hand, chances are very good that your webserver is running an instance of mysqld and that you can get to a mysql command-line prompt pretty easily. Every single time I've come across a tutorial that used SQLite, I had a hard time getting the correct packages installed, tweaking syntax, etc. I think most corporate/high-traffic applications are going to be using MySQL over SQLite. But hey, the beautiful thing about the DBI module in Perl is that it makes switching between underlying database engines very easy.

Back to Snippets Section

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: snippet [id://413299]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (8)
As of 2014-11-28 09:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (195 votes), past polls