Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

WebFetch::PerlMonks

by zdog (Priest)
on Jun 02, 2001 at 08:13 UTC ( #85153=sourcecode: print w/ replies, xml ) Need Help??

Category: PerlMonks.org Related Scripts
Author/Contact Info Zenon Zabinski | zdog7@hotmail.com
Description: This modules grabs the most recent PerlMonks.org posts using XML::Parser and generates a HTML file containing a list of links to those posts.

By default, the file is written to perlmonks.html. If that file already exists, a backup will be created at Operlmonks.html before the file is overwritten.

I guess you need to have the WebFetch module installed to run this.

Special thanks to :
- OeufMayo for creating the XML::Parser tutorial that helped me create this.
- mirod for helping me with the XML::Parser problem.

Bugfixes:
- xml_char () was altered to read in entire string as mirod suggested
- the tests to return () were moved to xml_end () from xml_start () to keep from reading too many strings into one field

Suggestions please ...

#
# WebFetch::PerlMonks.pm - get recent posts on PerlMonks.org
#
# Copyright (c) 2001 Zenon Zabinski (zdog7@hotmail.com).
# All rights reserved. This program is free software;
# you can redistribute it and/or modify it under the
# same terms as Perl itself.
#
# Based on the source code of the module 
# WebFetch::DebianNews and WebFetch::Slashdot.
#

package WebFetch::PerlMonks;

use strict;
use vars qw ($VERSION @ISA @EXPORT @Options $parser @bad_nodes @posts 
+$post);

use Exporter;
use XML::Parser;
use WebFetch;

@ISA = qw (Exporter WebFetch);
@EXPORT = qw (fetch_main);

# configuration parameters
$WebFetch::PerlMonks::filename = "perlmonks.html";
$WebFetch::PerlMonks::num_links = 30;
$WebFetch::PerlMonks::url = "http://www.perlmonks.org/index.pl?node=ne
+west+nodes+xml+generator";

# no user-servicable parts beyond this point

# XML stuff
$parser = XML::Parser->new (
    Handlers => {
        Start => \&xml_start,
        End   => \&xml_end,
        Char  => \&xml_char
    },
);

@bad_nodes = ('note', 'user', 'categorized answer');

sub fetch_main { WebFetch::run (); }

sub fetch
{
    my ( $self ) = @_;

    # set parameters for WebFetch routines
    $self->{url} = $WebFetch::PerlMonks::url;
    $self->{num_links} = $WebFetch::PerlMonks::num_links;
    $self->{table_sections} = $WebFetch::PerlMonks::table_sections;

    # process the links
    my $content = $self->get;
    $parser->parse ($$content);

    my @temp_posts = sort { $$b[1] <=> $$a[1] } @posts;
    undef @posts;

    for (my $i = 0; $i < $self->{num_links} && @temp_posts; $i++)
    {
        $temp_posts[0][1] =~ s/(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{
+2})/$4:$5:$6 $3-$2-$1/;
        $temp_posts[0][2] = "http://www.perlmonks.org/?node_id=". $tem
+p_posts[0][2];
        push @posts, shift (@temp_posts);
    }
    
    $self->html_gen ( $WebFetch::PerlMonks::filename, 
        sub { return "<a href=\"".$_[2]."\">".$_[0]."</a> (".$_[1].")"
+; },
        \@posts );

    # export content if --export was specified
    if ( defined $self->{export}) {
        $self->wf_export( $self->{export},
            [ "title", "date", "url" ],
            \@posts,
            "Exported from WebFetch::PerlMonks\n"
                ."\"title\" is article title\n"
                ."\"date\" is the date stamp\n"
                ."\"url\" is article URL" );
    }
}

sub xml_start
{
    my ($p, $el, %atts) = @_;
    $atts{'title'} = '';
    $post = \%atts;
}

sub xml_end
{
    my ($p, $el) = @_;
    return unless $el eq 'NODE';
    return if grep { m/^$atts{'nodetype'}$/ } @bad_nodes;
    push @posts, [$post->{'title'}, $post->{'createtime'}, $post->{'no
+de_id'}]
}

sub xml_char
{
    my ($p, $title) = @_;
    $post->{'title'} .= $title;
}

1;

__END__

# POD docs follow

=head1 NAME

WebFetch::PerlMonks - generate a file of recent PerlMonks.org posts

=head1 SYNOPSIS

>In perl scripts:

use WebFetch::PerlMonks; &fetch_main

>From the command line:

perl -w -MWebFetch::PerlMonks -e "&fetch_main" -- --dir directory

=head1 DESCRIPTION

This modules grabs the most recent PerlMonks.org posts using
XML::Parser and generates a HTML file containing a list of 
links to those posts.

By default, the file is written to perlmonks.html. If that file
already exists, a backup will be created at Operlmonks.html
before the file is overwritten.

=head1 AUTHOR

WebFetch was written by Ian Kluft
for the Silicon Valley Linux User Group (SVLUG).

The WebFetch::PerlMonks module was written by Zenon Zabinski.
Send patches or maintenance requests for this module to
C<zdog7@hotmail.com>.

=head1 SEE ALSO

WebFetch

=cut


Comment on WebFetch::PerlMonks
Download Code
Re: WebFetch::PerlMonks
by mirod (Canon) on Jun 02, 2001 at 09:36 UTC

    I am afraid you have the usual XML::Parser problem: the Char handler does not garantee that it will return the entire content of an element at once: it can be called several times for a single string, depending on entities being present and on the overall length of the document.

    So if one of the title includes an entity, as in "I Love B &D Perl", the char handler will be called 3 times: once with 'I Love B', once with '&' and once with 'D Perl', and you will only get the last part in $post->{title}.

    The solution in this case is simply to replace $post->{'title'} = $title; by $post->{'title'} .= $title;, but look for a more generic solution in the review.

Back to Code Catacombs

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: sourcecode [id://85153]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (6)
As of 2014-09-23 23:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (241 votes), past polls