Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: Swimsuits2005

by Limbic~Region (Chancellor)
on Feb 23, 2005 at 02:19 UTC ( #433549=note: print w/ replies, xml ) Need Help??


in reply to Swimsuits2005
in thread Swimsuits2004

merlyn,
This certainly isn't as succinct but I wanted to offer my contribution. It gets 110 of the 118 non-exclusive pics. All 8 of Marisa Miller's are skipped. It was fun - *shrug*.

#!/usr/bin/perl use strict; use warnings; use WWW::Mechanize; use HTML::TableContentParser; use HTML::TokeParser::Simple; use Data::Dumper; use constant MODELS => 3; use constant PICS => 4; -d "RESULTS" or mkdir "RESULTS", 0755 or die "cannot mkdir RESULTS: $! +"; chdir 'RESULTS' or die "cannot chdir RESULTS: $!"; my $url = 'http://sportsillustrated.cnn.com/'; my $mech = WWW::Mechanize->new( autocheck => 1 ); $mech->get( $url . 'features/2005_swimsuit/' ); my $table = HTML::TableContentParser->new()->parse( $mech->content() ) +; for my $row ( @{ $table->[MODELS]{rows} } ) { for my $cell ( @{ $row->{cells} } ) { my $link = get_link( $cell ); if ( $link ) { $mech->get( $url . $link ); my $t = HTML::TableContentParser->new()->parse( $mech->con +tent() ); for my $r ( @{ $t->[PICS]{rows} } ) { for my $c ( @{ $r->{cells} } ) { my $p = get_pic( $c ); if ( $p ) { my ($file) = $p =~ m|/([^/]+)$|; print "Checking $file\n"; if (-e $file) { print "skipping $file - already have\n"; } else { print "Downloading $file\n"; # Add error handling with $response if you + want my $response = $mech->mirror($p, $file); select(undef, undef, undef, .05); } } else { next; } } } } else { print STDERR "No link found in ", $cell->{data}, "\n"; } } } sub get_link { my $link = shift; my $p = HTML::TokeParser::Simple->new( \$link->{data} ); my $token = $p->get_token; return $token->is_start_tag( 'a' ) ? $token->return_attr( 'href' ) + : undef; } sub get_pic { my $link = shift; my $p = HTML::TokeParser::Simple->new( \$link->{data} ); while ( my $token = $p->get_token ) { next if ! $token->is_start_tag( 'img' ); my $src = $token->return_attr( 'src' ); if ( $src =~ m|/05_[\w]+_0\dt\.jpg$|i ) { $src =~ s/t\.jpg$/.jpg/i; return $src; } else { return undef; } } return undef; }
I didn't bother fixing it to get the exclusive photos, but it shouldn't be too difficult. Incidently, there is a non-linked photo we both miss. Carolyn Murphy has a hidden #2 pic.

Cheers - L~R


Comment on Re: Swimsuits2005
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://433549]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (17)
As of 2015-07-28 19:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (258 votes), past polls