Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re^2: Finding posts with zero replies.

by Limbic~Region (Chancellor)
on Aug 30, 2004 at 16:11 UTC ( #386948=note: print w/ replies, xml ) Need Help??


in reply to Re: Finding posts with zero replies.
in thread Finding posts with zero replies.

PodMaster,
I spent a little time with Super Search and couldn't find anything applicable. I then looked at PTAV and didn't see a way to do this. That's when I whipped up the following:

#!/usr/bin/perl use strict; use warnings; use HTML::TableContentParser; use HTML::TokeParser::Simple; use WWW::Mechanize; use constant SOPW => '&ct=12'; my $mech = WWW::Mechanize->new( autocheck => 1 ); $mech->get( 'http://www.tinymicros.com/ptav/index.pl' ); open (OUTPUT, '>', $ARGV[0] || 'noreplies.txt'); select OUTPUT; $| = 1; print OUTPUT "<html>\n<ul>\n"; for my $year ( $mech->find_all_links( url_regex => qr/year/ ) ) { $mech->get( $year->url() ); for my $month ( $mech->find_all_links( url_regex => qr/month/ ) ) +{ $mech->get( $month->url() ); for my $day ( $mech->find_all_links( url_regex => qr/day/ ) ) +{ $mech->get( $day->url() . SOPW ); my $table = HTML::TableContentParser->new()->parse( $mech- +>content() ); for my $row ( @{ $table->[-2]{rows} } ) { for my $cell ( @{ $row->{cells} } ) { if ( $cell->{data} =~ /\(0\)/ ) { print OUTPUT "<li>", clean_link( $cell ), "</l +i>\n"; next; } } } sleep 3; $mech->back(); } $mech->back(); } $mech->back(); } print OUTPUT "</ul>\n</html>\n"; sub clean_link { my $link = shift; my $p = HTML::TokeParser::Simple->new( \$link->{data} ); my $node; while ( my $token = $p->get_token ) { last if $token->is_end_tag; if ( $token->is_start_tag( 'a' ) ) { ($node) = $token->return_attr( 'href' ) =~ /(\d+)$/; next; } if ( $token->is_text ) { return "<a href='http://www.perlmonks.org/index.pl?node_id +=$node'>" . $token->as_is . "</a>"; } } }
It generates a list of all root SoPW nodes without replies. The two alternatives I have seen are even more lacking:
  • Use a modified view of Newest Nodes
  • This doesn't allow you to look at anything past a certain data and has no means of filtering beyond visual cues.
  • Use PTAV as built
  • This requires looking day by day and has no means of filtering beyond visual cues.

Cheers - L~R

Update:Added explanation of screen scraping and modified the code to only look at SoPW since that was all that was being asked for. Needs a resume capability so that if it breaks you can start where you left off.


Comment on Re^2: Finding posts with zero replies.
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://386948]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (3)
As of 2015-07-03 22:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (57 votes), past polls