I thought of doing that myself as well. You could of course set up a cron job to just fetch the
Selected Best Nodes to a timestamped file. I've set up the following cron job to archive that page into an SQLite database. It then prints out a reputation-sorted list of what it has already archived. That way, after a while, I'll have my own Top 5000 list.
#!/usr/bin/perl
use strict;
use warnings;
use HTML::TableExtract;
use LWP::Simple;
use DBI;
my $db_file = "best_nodes";
my $pm_site = "http://perlmonks.org/index.pl?node_id=%d";
my $make_table = ! -f $db_file;
my $dbh = DBI->connect("dbi:SQLite:dbname=$db_file", "", "")
or die "Can't connect to db: $DBI::errstr";
$dbh->do( qq[
create table nodes (
id int unique,
title varchar(255),
auth_id int,
author varchar(255),
rep int
)
]) if $make_table;
my $html = get( sprintf $pm_site, 328478 );
my $te = HTML::TableExtract->new(
headers => [ qw/Node Author Rep/ ],
keep_html => 1
);
$te->parse($html);
foreach my $row ($te->rows) {
my ($node, $author, $rep) = @$row;
my ($id) = $node =~ /\?node_id=(\d+)/;
my ($auth_id) = $author =~ /\?node_id=(\d+)/;
($rep) = $rep =~ /(\d+)/;
my ($title) = $node =~ m{>(.+?)</a>$};
($author) = $author =~ m{>(.+?)</a>$};
$dbh->do("delete from nodes where id=?", undef, $id);
$dbh->do("insert into nodes values (?,?,?,?,?)", undef,
$id, $title, $auth_id, $author, $rep);
}
my $sth = $dbh->prepare( qq[
select id,title,auth_id,author,rep from nodes order by rep desc
]);
$sth->execute;
open my $fh, ">bestnodes.html" or die;
print $fh "<table>\n";
while (my ($id, $title, $auth_id, $author, $rep) = $sth->fetchrow_arra
+y) {
$id = sprintf $pm_site, $id;
$auth_id = sprintf $pm_site, $auth_id;
print $fh qq[
<tr><td><a href="$id">$title</a></td>
<td><a href="$auth_id">$author</a></td>
<td>$rep</td></tr>
];
}
print $fh "</table>\n";
Incidentally, this is my first experience with
HTML::TableExtract, and it's just perfect for this job. Maybe I'll post the best nodes archive on my homepage once it gets big enough.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.