Category: PerlMonks Related Scripts
Author/Contact Info Zenon Zabinski |
Description: This modules grabs the most recent posts using XML::Parser and generates a HTML file containing a list of links to those posts.

By default, the file is written to perlmonks.html. If that file already exists, a backup will be created at Operlmonks.html before the file is overwritten.

I guess you need to have the WebFetch module installed to run this.

Special thanks to :
- OeufMayo for creating the XML::Parser tutorial that helped me create this.
- mirod for helping me with the XML::Parser problem.

- xml_char () was altered to read in entire string as mirod suggested
- the tests to return () were moved to xml_end () from xml_start () to keep from reading too many strings into one field

Suggestions please ...

# - get recent posts on
# Copyright (c) 2001 Zenon Zabinski (
# All rights reserved. This program is free software;
# you can redistribute it and/or modify it under the
# same terms as Perl itself.
# Based on the source code of the module 
# WebFetch::DebianNews and WebFetch::Slashdot.

package WebFetch::PerlMonks;

use strict;
use vars qw ($VERSION @ISA @EXPORT @Options $parser @bad_nodes @posts 

use Exporter;
use XML::Parser;
use WebFetch;

@ISA = qw (Exporter WebFetch);
@EXPORT = qw (fetch_main);

# configuration parameters
$WebFetch::PerlMonks::filename = "perlmonks.html";
$WebFetch::PerlMonks::num_links = 30;
$WebFetch::PerlMonks::url = "

# no user-servicable parts beyond this point

# XML stuff
$parser = XML::Parser->new (
    Handlers => {
        Start => \&xml_start,
        End   => \&xml_end,
        Char  => \&xml_char

@bad_nodes = ('note', 'user', 'categorized answer');

sub fetch_main { WebFetch::run (); }

sub fetch
    my ( $self ) = @_;

    # set parameters for WebFetch routines
    $self->{url} = $WebFetch::PerlMonks::url;
    $self->{num_links} = $WebFetch::PerlMonks::num_links;
    $self->{table_sections} = $WebFetch::PerlMonks::table_sections;

    # process the links
    my $content = $self->get;
    $parser->parse ($$content);

    my @temp_posts = sort { $$b[1] <=> $$a[1] } @posts;
    undef @posts;

    for (my $i = 0; $i < $self->{num_links} && @temp_posts; $i++)
        $temp_posts[0][1] =~ s/(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{
+2})/$4:$5:$6 $3-$2-$1/;
        $temp_posts[0][2] = "". $tem
        push @posts, shift (@temp_posts);
    $self->html_gen ( $WebFetch::PerlMonks::filename, 
        sub { return "<a href=\"".$_[2]."\">".$_[0]."</a> (".$_[1].")"
+; },
        \@posts );

    # export content if --export was specified
    if ( defined $self->{export}) {
        $self->wf_export( $self->{export},
            [ "title", "date", "url" ],
            "Exported from WebFetch::PerlMonks\n"
                ."\"title\" is article title\n"
                ."\"date\" is the date stamp\n"
                ."\"url\" is article URL" );

sub xml_start
    my ($p, $el, %atts) = @_;
    $atts{'title'} = '';
    $post = \%atts;

sub xml_end
    my ($p, $el) = @_;
    return unless $el eq 'NODE';
    return if grep { m/^$atts{'nodetype'}$/ } @bad_nodes;
    push @posts, [$post->{'title'}, $post->{'createtime'}, $post->{'no

sub xml_char
    my ($p, $title) = @_;
    $post->{'title'} .= $title;



# POD docs follow

=head1 NAME

WebFetch::PerlMonks - generate a file of recent posts


>In perl scripts:

use WebFetch::PerlMonks; &fetch_main

>From the command line:

perl -w -MWebFetch::PerlMonks -e "&fetch_main" -- --dir directory


This modules grabs the most recent posts using
XML::Parser and generates a HTML file containing a list of 
links to those posts.

By default, the file is written to perlmonks.html. If that file
already exists, a backup will be created at Operlmonks.html
before the file is overwritten.

=head1 AUTHOR

WebFetch was written by Ian Kluft
for the Silicon Valley Linux User Group (SVLUG).

The WebFetch::PerlMonks module was written by Zenon Zabinski.
Send patches or maintenance requests for this module to

=head1 SEE ALSO


Replies are listed 'Best First'.
Re: WebFetch::PerlMonks
by mirod (Canon) on Jun 02, 2001 at 09:36 UTC

    I am afraid you have the usual XML::Parser problem: the Char handler does not garantee that it will return the entire content of an element at once: it can be called several times for a single string, depending on entities being present and on the overall length of the document.

    So if one of the title includes an entity, as in "I Love B &D Perl", the char handler will be called 3 times: once with 'I Love B', once with '&' and once with 'D Perl', and you will only get the last part in $post->{title}.

    The solution in this case is simply to replace $post->{'title'} = $title; by $post->{'title'} .= $title;, but look for a more generic solution in the review.