Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

Unless you have a good ego, this is not such a cool use of Perl. If you do have good ego, then you may have noticed ActiveState's Mail Archive Leaders feature.

Following a series of events described in my use Perl journal, I created a small script that renders the count slightly more accurate. If you're on one of those lists, you might want to try it out. Ah the good old HTML scraping days...

CAVEAT: I never got the hang of format in Perl, hence the ugliness around the end. Any pointers to good sources (the perl docs didn't help much, for some weird reason) of examples are definitely appreciated.

#!/usr/bin/perl # getleaders [list-name] -- get the ten first people on ASPN archives, + with better acuracy # getleaders (defaults to perl-xml) # getleaders xml-dev (gets xml-dev) use strict; use vars qw($IN_PERSON %people); use LWP::Simple qw(); use HTML::Parser qw(); use constant BASE_URL => 'http://aspn.activestate.com/ASPN/Mail/Leader +s/'; my $list = shift || 'perl-xml'; my $url = BASE_URL . $list . '/'; my $html = LWP::Simple::get($url) or die "Could not get $url"; my $p = HTML::Parser->new( api_version => 3, start_h => [\&start_handler, 'tagname, attr'], text_h => [\&text_handler, 'dtext'], ); $p->unbroken_text(1); $p->parse($html); $p->eof; sub start_handler { my $tag = shift; my $attr = shift; if ($tag eq 'a' and $attr->{title} =~ m/Click to see postings by this + author/) { $IN_PERSON = 'person'; } } sub text_handler { my $txt = shift; return unless $IN_PERSON; $txt =~ s/^\s+//; $txt =~ s/\s+$//; if ($IN_PERSON eq 'person') { normalize(\$txt); $IN_PERSON = $txt; } elsif ($txt =~ m/\d+ posts/) { $people{$IN_PERSON} += $txt; # this numifies $IN_PERSON = undef; } } # this is very ad hoc sub normalize { my $txt = shift; $$txt = 'Ilya Sterin' if $$txt eq 'Sterin, Ilya'; $$txt = 'Barrie Slaymaker' if $$txt eq 'barries'; } # sort and print the result my @results = map { [ $_, $people{$_} ] } sort { $people{$b} <=> $peop +le{$a} } keys %people; my $longest = 0; for my $r (@results) { my $len = length $r->[0]; $longest = $len if $len > $longest; } my $nlen = length $results[0]->[1]; for my $i (0..9) { my $pad = ($i == 9) ? '' : '0'; print $pad . ($i + 1) . '. '; my $ppad = $longest - length $results[$i]->[0]; my $npad = $nlen - length $results[$i]->[1]; print $results[$i]->[0] . ' ' x $ppad . ' ' . ' ' x $npad . $resul +ts[$i]->[1] . "\n"; }

-- darobin -- knowscape 2 coming soon --


In reply to A little vanity by darobin

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (4)
As of 2024-05-28 01:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found