Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Using XML::RSS to produce an RSS 2.0 feed with Dates within Items

by mldvx4 (Scribe)
on Nov 18, 2019 at 06:44 UTC ( #11108853=perlquestion: print w/replies, xml ) Need Help??

mldvx4 has asked for the wisdom of the Perl Monks concerning the following question:

I am looking at an old script by John Bokma to build an RSS 2.0 file. I would like for each Item to include either a PubDate or a Dublin Core date. As-is, the script ignores the way I am trying to insert dates into items. The variable $modified_date contains the date for any given file in what I believe is a valid format. I am not able to see how to insert it into an Item. I have used the add_module method to include the Dublin Core name space, but am not sure if that is the correct way.

For a sample run, the script would be invoked like this in a directory with HTML files: ./create-rss-feed.pl --dir . --domain example.com --title foo --desc foobar ./*.html
It runs fine except does not seem to add the date to Item elements. So I am having problems understanding the add_item method. Neither  PubDate => $modified_date, nor dc => { 'dc:date' => $modified_date }, produce an error, nor do they actually get a date into an Item.

#!/usr/bin/perl # based almost entirely on John Bokma's,script from 2005 # from http://johnbokma.com/, # http://johnbokma.com/perl/rss-web-feed-builder.html use strict; use warnings; use POSIX; use XML::RSS; use File::Find; use Getopt::Long; use HTML::TreeBuilder; my $domain; my $dir; my $title; my $description; GetOptions( "dir=s" => \$dir, "domain=s" => \$domain, "title=s" => \$title, "desc=s" => \$description, ) or show_help(); (defined $dir and defined $domain) or show_help(); my ($file_history, $files) = fetch_files($dir); my $rss = new XML::RSS(version => '2.0'); $rss->channel( title => $title, link => "https://$domain/", description => $description, pubDate => strftime("%a, %d %b %Y %H:%M:%S %Z", gmtime time), # Thu, 23 Aug 1999 07:00:00 GMT ); $rss->add_module(prefix=>'dc', uri=>'http://purl.org/rss/1.0/modules/dc/'); foreach my $file (@$files) { my ($title, $description) = get_file_meta($file); my $link = "https://$domain/" . substr $file, length $dir; $link =~ s/index\.html?$//; my $modified_date = format_date_time($file_history->{$file}); $rss->add_item( title => $title, link => $link, description => $description, PubDate => $modified_date, dc => { 'dc:date' => $modified_date }, ); } print $rss->as_string; # # # PRIVATE METHODS sub fetch_files { my ($dir) = @_; my $file_history; find sub { -f or return; /\.html?$/ or return; $file_history->{$File::Find::name} = (stat)[9]; }, $dir; # Sort the file on modification time, ascending. my @file_names = sort { $file_history->{$a} <=> $file_history->{$b} } keys %$file_history; return ($file_history, \@file_names); } sub show_help { print <<HELP; Usage: $0 [options] > index.rss Options: --dir Path to the document root --domain Domain name --title Title of feed --desc Description of feed HELP exit 1; } sub format_date_time { my ($time) = @_; my @time = gmtime $time; return sprintf "%4d-%02d-%02dT%02d:%02d:%02dZ", $time[5] + 1900, $time[4] + 1, $time[3], $time[2], $time[1], $time[0]; } sub get_file_meta { my ($file_name) = @_; my $root = HTML::TreeBuilder->new; $root->parse_file($file_name); my $title_element = $root->look_down(_tag => 'title'); my $title = defined $title_element ? $title_element->as_text : 'N/A'; my $p_element = $root->look_down(_tag => 'p'); my $description = defined $p_element ? $p_element->as_text : ( defined $title_element ? $title : 'N/A' ); $root->delete; return ($title, $description); }

So, if you run the script over some HTML files, you'll see that it makes an RSS file but without dates in any of the Item elements.

Replies are listed 'Best First'.
Re: Using XML::RSS to produce an RSS 2.0 feed with Dates within Items
by Anonymous Monk on Nov 18, 2019 at 07:39 UTC
    You have a typo: PubDate should be pubDate in $rss->add_item.

    Also, from the perldoc: Note: In order to parse and generate dates (such as pubDate and dc:date) it is recommended to use DateTime::Format::Mail and DateTime::Format::W3CDTF , which is what XML::RSS uses internally and requires.

      Thanks. Well spotted. It works when I use the correct case. I'll redo the way the date is generated, too

      About the Dublin Core metadata, though, what can be done to add that into RSS 2.0 Items?

        From XML::RSS:
        $rss->add_item (title=>$title, link=>$link, dc=>{ subject=>$subject, c +reator=>$creator, date=>$date });
Re: Using XML::RSS to produce an RSS 2.0 feed with Dates within Items
by Anonymous Monk on Nov 19, 2019 at 14:20 UTC
    I am looking at an old script by John Bokma to build an RSS 2.0 file. I would like for each Item to include either a PubDate or a Dublin Core date.

    You can't do that with RSS 2.0, try 1.0:

    XML::RSS->new(version => '1.0')

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://11108853]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (4)
As of 2019-12-07 19:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Strict and warnings: which comes first?



    Results (162 votes). Check out past polls.

    Notices?