Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

XML::RSS::Parser::Lite Question

by BlenderHead (Novice)
on Nov 18, 2009 at 09:56 UTC ( [id://807880]=perlquestion: print w/replies, xml ) Need Help??

BlenderHead has asked for the wisdom of the Perl Monks concerning the following question:

Greetings Perl People:

Am learning XML::RSS::Parser::Lite, which seems simple enough to use, but am having some questions on the output.

I wrote the following program to aggregate some RSS feeds:

############################## #!/usr/bin/perl -Tw use strict; use XML::RSS::Parser::Lite; use LWP::Simple; my $xml = get("http://www.scripting.com/rss.xml"); my $rp = new XML::RSS::Parser::Lite; $rp->parse($xml); print "Content-type:text/html\n\n"; my $page_title = $rp->get('title'); my $blog_url = $rp->get('url'); my $blog_desc = $rp->get('description'); print "<a href=$blog_url>$page_title</a>"; print "<BR>$blog_desc<P>"; print "<hr>\n"; for (my $i = 0; $i < $rp->count(); $i++) { my $it = $rp->get($i); my $item_title = $it->get('title'); my $item_url = $it->get('url'); my $item_description = $it->get('description'); print "<a href=$item_url>$item_title</a><P>$item_description<P>\n"; } ##############################

However, the output looks like this:

http://lab.marketproductions.com/cgi-bin/rss/rsstrial.cgi

As you can see, I got some of the code to work, but the HTML "inside" the posts isnt working.

Can anyone please explain a solution?

Ty.

BH

Replies are listed 'Best First'.
Re: XML::RSS::Parser::Lite Question
by Corion (Patriarch) on Nov 18, 2009 at 10:12 UTC

    The HTML likely is entity-encoded. Have a look at HTML::Entities. Also, you should be aware that malicious HTML could be injected into your page from such a feed if you're not careful. My advice is to let only "safe" HTML tags through, like <p>, <b>, <i>. I wouldn't even embed images, as that implies a HTTP request from the client viewing your aggregate to a potentially unsafe server.

      Corion is correct on all points above, part of the XML spec is that any text between tags eg.
      <description><b>Best post ever: </b>This is a super hoopy post froods< +/description>
      Must be rendered XML safe, ie
      <description>&lt;b&gt;Best post ever: &lt;/b&gt;This is a super hoopy +post froods</description>
      This prevents confusion when using XPath tools.

      On security, if your users are loading remote data from a session on your service, be very very sure that

      • No javascript injection is possible
      • You are not revealing session info (HTTP_REFERER)
      • No javascript injection is possible
      Do not blindly convert the HTML::Entities back to HTML as this may result in execution of malicious code within your users' browsers, while they are logged into your service.
      The best way of preventing XSS is with whitelisting of HTML tags and allowed attributes for each tag
          (consider <b onmouseover="doEvil();">Some text</b> when allowing specific tags) have a look at HTML::Scrubber

      The best way of retrieving remote images witout revealing session info is to ensure all such info is in the header rather than URL of requests (POST).

      EditAnd another thing about remote images I'd forgotten to mention, some browsers do content sniffing and ignore the alledged nature of the content, Interesting article on the dangers of content sniffing and how to handle

        Thank you for these warnings!

        Am not sure of all the implications of the security issues you mention, but - since the above code contained links which would evoke HTTP_REFERER, I disabled the script.

        The bulk of my actual intent was to use the RSS scripts under privatized servers, so only trusted content would be fed to the aggregator. But these issues with public content are good to know, as there's always the want for greater inclusion.

        If anyone wishes to further comment, then please feel free to do so. I'm still not sure of how the HTTP_REFERER could be traced, but looking into it now. Anyway, the script link above will not work - though the question is still held open for comment.

        Ty.

        BH

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://807880]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (3)
As of 2025-07-11 22:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.