Beefy Boxes and Bandwidth Generously Provided by pair Networks Frank
Welcome to the Monastery
 
PerlMonks  

Parsing multiple RSS files

by perl.j (Pilgrim)
on Oct 23, 2013 at 20:58 UTC ( #1059348=perlquestion: print w/ replies, xml ) Need Help??
perl.j has asked for the wisdom of the Perl Monks concerning the following question:

Hey Everyone!

I'm trying to hack together a little tool to help me parse RSS feeds. Basically, the code I currently have takes a keyword(s) and prints the articles that have that word in the title. Here is the code:

use 5.14.2; use strict; use warnings; use XML::RSSLite; use LWP::Simple; my @keywords = qw(approach); my $URL = 'http://www.theguardian.com/theguardian/mainsection/rss'; my $content = get($URL); my %result; parseRSS(\%result, \$content); my $re = join "|", @keywords; $re = qr/\b(?:$re)\b/i; foreach my $item (@{ $result{items} }) { my $title = $item->{title}; $title =~ s{\s+}{ }; $title =~ s{^\s+}{ }; $title =~ s{\s+$}{ }; if ($title =~ /$re/) { print "$title\n\t$item->{link}\n\n"; } }

This gives me the desired effect with one url, but I need to parse ~20 of these and print the articles from all of them.

I attempted to do this by turning $URL into an array (by making it @URL), and changed that variable throughout the code, but that just gave me several errors.

So, my question is, how can I parse multiple RSS feeds in one script and have all of the output formatted the same way into the same file?

--perl.j

Comment on Parsing multiple RSS files
Download Code
Re: Parsing multiple RSS files
by atcroft (Monsignor) on Oct 23, 2013 at 21:19 UTC

    Would this (untested!) not work?

    use 5.14.2; use strict; use warnings; use XML::RSSLite; use LWP::Simple; my @keywords = qw(approach); my @URLlist = ( 'http://www.theguardian.com/theguardian/mainsection/rss', 'http://www.theguardian.com/theguardian/mainsection/rss1', ); foreach my $URL ( @URLlist ) { my $content = get($URL); my %result; parseRSS(\%result, \$content); my $re = join "|", @keywords; $re = qr/\b(?:$re)\b/i; foreach my $item (@{ $result{items} }) { my $title = $item->{title}; $title =~ s{\s+}{ }; $title =~ s{^\s+}{ }; $title =~ s{\s+$}{ }; if ($title =~ /$re/) { print "$title\n\t$item->{link}\n\n"; } } }
Re: Parsing multiple RSS files
by jethro (Monsignor) on Oct 23, 2013 at 21:25 UTC

    The get() function of LWP::Simple does not know how to work with arrays, you need to give it URLs one by one. This is done with a loop. That would look like this:

    foreach my $URL (@URL) { ... }

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1059348]
Approved by keszler
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (6)
As of 2014-04-18 22:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (472 votes), past polls