Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re^2: Finding posts with zero replies.

by Limbic~Region (Chancellor)
on Aug 30, 2004 at 16:11 UTC ( #386948=note: print w/ replies, xml ) Need Help??


in reply to Re: Finding posts with zero replies.
in thread Finding posts with zero replies.

PodMaster,
I spent a little time with Super Search and couldn't find anything applicable. I then looked at PTAV and didn't see a way to do this. That's when I whipped up the following:

#!/usr/bin/perl use strict; use warnings; use HTML::TableContentParser; use HTML::TokeParser::Simple; use WWW::Mechanize; use constant SOPW => '&ct=12'; my $mech = WWW::Mechanize->new( autocheck => 1 ); $mech->get( 'http://www.tinymicros.com/ptav/index.pl' ); open (OUTPUT, '>', $ARGV[0] || 'noreplies.txt'); select OUTPUT; $| = 1; print OUTPUT "<html>\n<ul>\n"; for my $year ( $mech->find_all_links( url_regex => qr/year/ ) ) { $mech->get( $year->url() ); for my $month ( $mech->find_all_links( url_regex => qr/month/ ) ) +{ $mech->get( $month->url() ); for my $day ( $mech->find_all_links( url_regex => qr/day/ ) ) +{ $mech->get( $day->url() . SOPW ); my $table = HTML::TableContentParser->new()->parse( $mech- +>content() ); for my $row ( @{ $table->[-2]{rows} } ) { for my $cell ( @{ $row->{cells} } ) { if ( $cell->{data} =~ /\(0\)/ ) { print OUTPUT "<li>", clean_link( $cell ), "</l +i>\n"; next; } } } sleep 3; $mech->back(); } $mech->back(); } $mech->back(); } print OUTPUT "</ul>\n</html>\n"; sub clean_link { my $link = shift; my $p = HTML::TokeParser::Simple->new( \$link->{data} ); my $node; while ( my $token = $p->get_token ) { last if $token->is_end_tag; if ( $token->is_start_tag( 'a' ) ) { ($node) = $token->return_attr( 'href' ) =~ /(\d+)$/; next; } if ( $token->is_text ) { return "<a href='http://www.perlmonks.org/index.pl?node_id +=$node'>" . $token->as_is . "</a>"; } } }
It generates a list of all root SoPW nodes without replies. The two alternatives I have seen are even more lacking:
  • Use a modified view of Newest Nodes
  • This doesn't allow you to look at anything past a certain data and has no means of filtering beyond visual cues.
  • Use PTAV as built
  • This requires looking day by day and has no means of filtering beyond visual cues.

Cheers - L~R

Update:Added explanation of screen scraping and modified the code to only look at SoPW since that was all that was being asked for. Needs a resume capability so that if it breaks you can start where you left off.


Comment on Re^2: Finding posts with zero replies.
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://386948]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (5)
As of 2014-07-23 03:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (131 votes), past polls