Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Perl Mechanize-optimizaztion: make a script running faster with less overhead

by Perlbeginner1 (Scribe)
on Feb 19, 2012 at 23:49 UTC ( #954942=perlquestion: print w/ replies, xml ) Need Help??
Perlbeginner1 has asked for the wisdom of the Perl Monks concerning the following question:

dear monks - this is the thread for the: Perl Mechanize-optimizaztion: make a script running faster with less overhead

Problem: I have a list of 2500 websites and need to grab a thumbnail screenshot of them. How do I do that? I could try to parse the sites either with Perl.- Mechanize would be a good thing. Note: i only need the results as a thumbnails that are a maximum 240 pixels in the long dimension.

Prerequisites: https://addons.mozilla.org/en-US/firefox/addon/mozrepl/
the module WWW::Mechanize::Firefox;
the module imager http://search.cpan.org/~tonyc/Imager-0.87/Imager.pm

First Approach: Here is a first Perl solution:

use WWW::Mechanize::Firefox; my $mech = WWW::Mechanize::Firefox->new(); $mech->get('http://google.com'); my $png = $mech->content_as_png();



Outline: This returns the given tab or the current page rendered as PNG image. All parameters are optional. $tab defaults to the current tab. If the coordinates are given, that rectangle will be cut out. The coordinates should be a hash with the four usual entries, left,top,width,height.This is specific to WWW::Mechanize::Firefox.
As i understand from the perldoc that option with the coordinates, it is not the resize of the whole page it's just a rectangle cut out of it.... well the WWW::Mechanize::Firefox takes care for how to save screenshots. Well i forgot to mention that i only need to have the images as small thumbnails - so we do not have to have a very very large files...i only need to grab a thumbnail screenshot of them. I have done a lookup on cpan for some module that scales down the $png and i found out Imager
The module does not concern itself with resizing images. Here we have the various image modules on CPAN, like Imager. http://search.cpan.org/~tonyc/Imager-0.87/Imager.pm
Imager - Perl extension for Generating 24 bit Images: Imager is a module for creating and altering images. It can read and write various image formats, draw primitive shapes like lines,and polygons, blend multiple images together in various ways, scale,crop, render text and more. I installed the module - but i did not have extended my basic-approach

What i have tried allready; here it is:
#!/usr/bin/perl use strict; use warnings; use WWW::Mechanize::Firefox; my $mech = new WWW::Mechanize::Firefox(); open(INPUT, "<urls.txt") or die $!; while (<INPUT>) { chomp; print "$_\n"; $mech->get($_); my $png = $mech->content_as_png(); my $name = "$_"; $name =~s/^www\.//; $name .= ".png"; open(OUTPUT, ">$name"); print OUTPUT $png; sleep (5); }



Well this does not care about the size:
See the output commandline:

linux-vi17:/home/martin/perl # perl mecha_test_1.pl www.google.com www.cnn.com www.msnbc.com command timed-out at /usr/lib/perl5/site_perl/5.12.3/MozRepl/Client.pm + line 186 linux-vi17:/home/martin/perl #
This is my source ... see the

urls.txt www.google.com www.cnn.com www.msnbc.com news.bbc.co.uk www.bing.com www.yahoo.com


Question: how to extend the solution either to make sure that it does not stop in a time out. and - it does only store little thumbnails

Note:again: i only need the results as a thumbnails that are a maximum 240 pixels in the long dimension.
As a prerequisites, i allready have installed the module imager http://search.cpan.org/~tonyc/Imager-0.87/Imager.pm
love to hear from you!

Comment on Perl Mechanize-optimizaztion: make a script running faster with less overhead
Select or Download Code
Reaped: Re: Perl Mechanize-optimizaztion: make a script running faster with less overhead
by NodeReaper (Curate) on Feb 20, 2012 at 00:12 UTC
Re: Perl Mechanize-optimizaztion: make a script running faster with less overhead
by marto (Chancellor) on Feb 20, 2012 at 10:49 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://954942]
Approved by planetscape
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (7)
As of 2014-08-29 02:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (275 votes), past polls