merlyn,
This certainly isn't as succinct but I wanted to offer my contribution. It gets 110 of the 118 non-exclusive pics. All 8 of Marisa Miller's are skipped. It was fun - *shrug*.
#!/usr/bin/perl
use strict;
use warnings;
use WWW::Mechanize;
use HTML::TableContentParser;
use HTML::TokeParser::Simple;
use Data::Dumper;
use constant MODELS => 3;
use constant PICS => 4;
-d "RESULTS" or mkdir "RESULTS", 0755 or die "cannot mkdir RESULTS: $!
+";
chdir 'RESULTS' or die "cannot chdir RESULTS: $!";
my $url = 'http://sportsillustrated.cnn.com/';
my $mech = WWW::Mechanize->new( autocheck => 1 );
$mech->get( $url . 'features/2005_swimsuit/' );
my $table = HTML::TableContentParser->new()->parse( $mech->content() )
+;
for my $row ( @{ $table->[MODELS]{rows} } ) {
for my $cell ( @{ $row->{cells} } ) {
my $link = get_link( $cell );
if ( $link ) {
$mech->get( $url . $link );
my $t = HTML::TableContentParser->new()->parse( $mech->con
+tent() );
for my $r ( @{ $t->[PICS]{rows} } ) {
for my $c ( @{ $r->{cells} } ) {
my $p = get_pic( $c );
if ( $p ) {
my ($file) = $p =~ m|/([^/]+)$|;
print "Checking $file\n";
if (-e $file) {
print "skipping $file - already have\n";
}
else {
print "Downloading $file\n";
# Add error handling with $response if you
+ want
my $response = $mech->mirror($p, $file);
select(undef, undef, undef, .05);
}
}
else {
next;
}
}
}
}
else {
print STDERR "No link found in ", $cell->{data}, "\n";
}
}
}
sub get_link {
my $link = shift;
my $p = HTML::TokeParser::Simple->new( \$link->{data} );
my $token = $p->get_token;
return $token->is_start_tag( 'a' ) ? $token->return_attr( 'href' )
+ : undef;
}
sub get_pic {
my $link = shift;
my $p = HTML::TokeParser::Simple->new( \$link->{data} );
while ( my $token = $p->get_token ) {
next if ! $token->is_start_tag( 'img' );
my $src = $token->return_attr( 'src' );
if ( $src =~ m|/05_[\w]+_0\dt\.jpg$|i ) {
$src =~ s/t\.jpg$/.jpg/i;
return $src;
}
else {
return undef;
}
}
return undef;
}
I didn't bother fixing it to get the exclusive photos, but it shouldn't be too difficult. Incidently, there is a non-linked photo we both miss. Carolyn Murphy has a hidden #2 pic.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|