Beefy Boxes and Bandwidth Generously Provided by pair Networks vroom
Syntactic Confectionery Delight

Re^2: Finding posts with zero replies.

by Limbic~Region (Chancellor)
on Aug 30, 2004 at 16:11 UTC ( #386948=note: print w/ replies, xml ) Need Help??

in reply to Re: Finding posts with zero replies.
in thread Finding posts with zero replies.

I spent a little time with Super Search and couldn't find anything applicable. I then looked at PTAV and didn't see a way to do this. That's when I whipped up the following:

#!/usr/bin/perl use strict; use warnings; use HTML::TableContentParser; use HTML::TokeParser::Simple; use WWW::Mechanize; use constant SOPW => '&ct=12'; my $mech = WWW::Mechanize->new( autocheck => 1 ); $mech->get( '' ); open (OUTPUT, '>', $ARGV[0] || 'noreplies.txt'); select OUTPUT; $| = 1; print OUTPUT "<html>\n<ul>\n"; for my $year ( $mech->find_all_links( url_regex => qr/year/ ) ) { $mech->get( $year->url() ); for my $month ( $mech->find_all_links( url_regex => qr/month/ ) ) +{ $mech->get( $month->url() ); for my $day ( $mech->find_all_links( url_regex => qr/day/ ) ) +{ $mech->get( $day->url() . SOPW ); my $table = HTML::TableContentParser->new()->parse( $mech- +>content() ); for my $row ( @{ $table->[-2]{rows} } ) { for my $cell ( @{ $row->{cells} } ) { if ( $cell->{data} =~ /\(0\)/ ) { print OUTPUT "<li>", clean_link( $cell ), "</l +i>\n"; next; } } } sleep 3; $mech->back(); } $mech->back(); } $mech->back(); } print OUTPUT "</ul>\n</html>\n"; sub clean_link { my $link = shift; my $p = HTML::TokeParser::Simple->new( \$link->{data} ); my $node; while ( my $token = $p->get_token ) { last if $token->is_end_tag; if ( $token->is_start_tag( 'a' ) ) { ($node) = $token->return_attr( 'href' ) =~ /(\d+)$/; next; } if ( $token->is_text ) { return "<a href=' +=$node'>" . $token->as_is . "</a>"; } } }
It generates a list of all root SoPW nodes without replies. The two alternatives I have seen are even more lacking:
  • Use a modified view of Newest Nodes
  • This doesn't allow you to look at anything past a certain data and has no means of filtering beyond visual cues.
  • Use PTAV as built
  • This requires looking day by day and has no means of filtering beyond visual cues.

Cheers - L~R

Update:Added explanation of screen scraping and modified the code to only look at SoPW since that was all that was being asked for. Needs a resume capability so that if it breaks you can start where you left off.

Comment on Re^2: Finding posts with zero replies.
Download Code

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://386948]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (9)
As of 2014-04-18 03:59 GMT
Find Nodes?
    Voting Booth?

    April first is:

    Results (461 votes), past polls