Category: | XML |
Author/Contact Info | radixzer0 |
Description: | Make your own Portal! Amaze your friends! Confuse your enemies! This quick hack takes a list of RSS feeds and pulls the links into a local database. If you don't know what RSS is, it's the cool XML headline standard used by My Netscape. Lots of sites provide RSS feeds that you can use to make headline links on your own site (like Slashdot.org, ZDNet, etc.). I used this to make a little headline scroller using DHTML on our company Intranet. This script works best with a scheduler (like cron) to update on a periodic basis. For a comprehensive list of available feeds, take a look at http://www.xmltree.com. Comments/improvements more than welcome ;) |
#!c:\perl\bin\perl use LWP::UserAgent; use XML::RSS; #We're running this off of a Windows machine, connecting to a M$SQL se +rver # although any old SQL server would do (e.g. MySQL) use Win32::ODBC; $DSN = "TESTSERVER"; #Create a new UserAgent to pull the XML data down $ua = new LWP::UserAgent; $ua->agent("HeadlineAgent/0.1 ".$ua->agent); #connect via ODBC to the SQL server if(!($db = new Win32::ODBC($DSN))){ print "Error connecting to $DSN\n"; print "Error: " . Win32::ODBC::Error() . "\n"; exit; } # We'll be pulling in RSS files from various sources, # their URL's are stored in the SQL database my %sources; if($db->Sql("SELECT * FROM ExternalNewsSources")) { print "SQL failed.\n"; print "Error: " . $db->Error() . "\n"; $db->Close(); exit; } while($db->FetchRow()){ my(%data) = $db->DataHash(); # ...process the data... # Add to hash of hashes $sources{$data{'ExternalNewsSourceID'}} = $data{'Source'}; } #Create the RSS object to parse the RSS files retrieved... my $rss = new XML::RSS; ($sec,$min,$hour,$mday,$mon,$year) = localtime(time); # preformatted string compatible with SQLServer's timestamp field $nowstring = sprintf("%02i/%02i/%i %02i:%02i:%02i",($mon+1),$mday,($ye +ar+1900),$hour,$min,$sec); #Walk through each of the XML sources foreach $sourceid(keys %sources) { # fetch RSS file from the source's URL my $request = new HTTP::Request GET => $sources{$sourceid}; my $result = $ua->request($request); if($result->is_success) { # grok the RSS file retrieved $rss->parse($result->content); # Step through all the links in the RSS for my $i (@{$rss->{items}}) { # Check to see if we've already seen this link from this source +before... $db->Sql("SELECT * FROM ExternalNews WHERE SourceID=".$sou +rceid." AND Link = '".$i->{'link'}."'"); if($db->FetchRow()) { #skip it - it's here already... } #Sometimes the RSS mis-parses and give us an empty item elsif(length($i->{'title'}) <= 0) { #skip it - it's empty... } else { #Plunk it into the database $db->Sql("INSERT INTO ExternalNews (SourceID,PostDate, +Title,Link,Description) VALUES ($sourceid,'$nowstring','".$i->{'title +'}."','".$i->{'link'}."','".$i->{'description'}."')"); } # Nuke the current values in the object, it appears that the XML lib r +ecycles the variables without clearing them... $i->{'title'} = ''; $i->{'link'} = ''; $i->{'description'} = ''; } } else { print "Doh! couldnt get ".$sources{$sourceid}.": $!\n"; } } #clean up $db->Close(); |
|
---|
Replies are listed 'Best First'. | |
---|---|
RE: RSS Headline Sucker
by merlyn (Sage) on Oct 01, 2000 at 00:11 UTC | |
RE: RSS Headline Sucker
by cei (Monk) on May 03, 2000 at 10:54 UTC | |
by radixzer0 (Beadle) on May 10, 2000 at 21:13 UTC | |
by Anonymous Monk on Oct 01, 2000 at 00:05 UTC | |
by Anonymous Monk on Oct 01, 2000 at 01:24 UTC | |
by Anonymous Monk on Sep 30, 2000 at 23:58 UTC | |
RE: RSS Headline Sucker
by perlcgi (Hermit) on Apr 28, 2000 at 00:49 UTC | |
by radixzer0 (Beadle) on Apr 28, 2000 at 03:27 UTC |
Back to
Code Catacombs