Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

WebFetch::PerlMonks

by zdog (Priest)
on Jun 02, 2001 at 08:13 UTC ( #85153=sourcecode: print w/replies, xml ) Need Help??
Category: PerlMonks Related Scripts
Author/Contact Info Zenon Zabinski | zdog7@hotmail.com
Description: This modules grabs the most recent PerlMonks.org posts using XML::Parser and generates a HTML file containing a list of links to those posts.

By default, the file is written to perlmonks.html. If that file already exists, a backup will be created at Operlmonks.html before the file is overwritten.

I guess you need to have the WebFetch module installed to run this.

Special thanks to :
- OeufMayo for creating the XML::Parser tutorial that helped me create this.
- mirod for helping me with the XML::Parser problem.

Bugfixes:
- xml_char () was altered to read in entire string as mirod suggested
- the tests to return () were moved to xml_end () from xml_start () to keep from reading too many strings into one field

Suggestions please ...

#
# WebFetch::PerlMonks.pm - get recent posts on PerlMonks.org
#
# Copyright (c) 2001 Zenon Zabinski (zdog7@hotmail.com).
# All rights reserved. This program is free software;
# you can redistribute it and/or modify it under the
# same terms as Perl itself.
#
# Based on the source code of the module 
# WebFetch::DebianNews and WebFetch::Slashdot.
#

package WebFetch::PerlMonks;

use strict;
use vars qw ($VERSION @ISA @EXPORT @Options $parser @bad_nodes @posts 
+$post);

use Exporter;
use XML::Parser;
use WebFetch;

@ISA = qw (Exporter WebFetch);
@EXPORT = qw (fetch_main);

# configuration parameters
$WebFetch::PerlMonks::filename = "perlmonks.html";
$WebFetch::PerlMonks::num_links = 30;
$WebFetch::PerlMonks::url = "http://www.perlmonks.org/index.pl?node=ne
+west+nodes+xml+generator";

# no user-servicable parts beyond this point

# XML stuff
$parser = XML::Parser->new (
    Handlers => {
        Start => \&xml_start,
        End   => \&xml_end,
        Char  => \&xml_char
    },
);

@bad_nodes = ('note', 'user', 'categorized answer');

sub fetch_main { WebFetch::run (); }

sub fetch
{
    my ( $self ) = @_;

    # set parameters for WebFetch routines
    $self->{url} = $WebFetch::PerlMonks::url;
    $self->{num_links} = $WebFetch::PerlMonks::num_links;
    $self->{table_sections} = $WebFetch::PerlMonks::table_sections;

    # process the links
    my $content = $self->get;
    $parser->parse ($$content);

    my @temp_posts = sort { $$b[1] <=> $$a[1] } @posts;
    undef @posts;

    for (my $i = 0; $i < $self->{num_links} && @temp_posts; $i++)
    {
        $temp_posts[0][1] =~ s/(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{
+2})/$4:$5:$6 $3-$2-$1/;
        $temp_posts[0][2] = "http://www.perlmonks.org/?node_id=". $tem
+p_posts[0][2];
        push @posts, shift (@temp_posts);
    }
    
    $self->html_gen ( $WebFetch::PerlMonks::filename, 
        sub { return "<a href=\"".$_[2]."\">".$_[0]."</a> (".$_[1].")"
+; },
        \@posts );

    # export content if --export was specified
    if ( defined $self->{export}) {
        $self->wf_export( $self->{export},
            [ "title", "date", "url" ],
            \@posts,
            "Exported from WebFetch::PerlMonks\n"
                ."\"title\" is article title\n"
                ."\"date\" is the date stamp\n"
                ."\"url\" is article URL" );
    }
}

sub xml_start
{
    my ($p, $el, %atts) = @_;
    $atts{'title'} = '';
    $post = \%atts;
}

sub xml_end
{
    my ($p, $el) = @_;
    return unless $el eq 'NODE';
    return if grep { m/^$atts{'nodetype'}$/ } @bad_nodes;
    push @posts, [$post->{'title'}, $post->{'createtime'}, $post->{'no
+de_id'}]
}

sub xml_char
{
    my ($p, $title) = @_;
    $post->{'title'} .= $title;
}

1;

__END__

# POD docs follow

=head1 NAME

WebFetch::PerlMonks - generate a file of recent PerlMonks.org posts

=head1 SYNOPSIS

>In perl scripts:

use WebFetch::PerlMonks; &fetch_main

>From the command line:

perl -w -MWebFetch::PerlMonks -e "&fetch_main" -- --dir directory

=head1 DESCRIPTION

This modules grabs the most recent PerlMonks.org posts using
XML::Parser and generates a HTML file containing a list of 
links to those posts.

By default, the file is written to perlmonks.html. If that file
already exists, a backup will be created at Operlmonks.html
before the file is overwritten.

=head1 AUTHOR

WebFetch was written by Ian Kluft
for the Silicon Valley Linux User Group (SVLUG).

The WebFetch::PerlMonks module was written by Zenon Zabinski.
Send patches or maintenance requests for this module to
C<zdog7@hotmail.com>.

=head1 SEE ALSO

WebFetch

=cut


Replies are listed 'Best First'.
Re: WebFetch::PerlMonks
by mirod (Canon) on Jun 02, 2001 at 09:36 UTC

    I am afraid you have the usual XML::Parser problem: the Char handler does not garantee that it will return the entire content of an element at once: it can be called several times for a single string, depending on entities being present and on the overall length of the document.

    So if one of the title includes an entity, as in "I Love B &D Perl", the char handler will be called 3 times: once with 'I Love B', once with '&' and once with 'D Perl', and you will only get the last part in $post->{title}.

    The solution in this case is simply to replace $post->{'title'} = $title; by $post->{'title'} .= $title;, but look for a more generic solution in the review.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: sourcecode [id://85153]
help
Chatterbox?
[LanX]: he was "meating" Italian monks oO?
[erix]: then again, there are people who do 10 full triathlons on 10 consecutive days, so it must be doable
choroba's friend has done the Pacific Crest Trail, it's 4200 km
LanX Ladies and gentlemen: due to shortages of "Body of Christ" we have to resort to Body of Monks ...
LanX ... buon appetito!
[LanX]: Pacific Crest Trail is really hard stuff
[Eily]: erix he plans to go as far as leuca
[erix]: ah, that's really down at the heel point :)
[Eily]: LanX how dare you imply that I made a typo? It's true but still!
[Eily]: or is it a typo if the mistake was in my brain rather than in the process of typing?

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (9)
As of 2017-12-13 15:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What programming language do you hate the most?




















    Results (369 votes). Check out past polls.

    Notices?