Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

Is Perl capable of doing this?

by anabelljc (Initiate)
on Apr 06, 2012 at 17:35 UTC ( #963846=perlquestion: print w/replies, xml ) Need Help??
anabelljc has asked for the wisdom of the Perl Monks concerning the following question:

Here's my task: I have a website where there is a new piece of information (mostly numbers and graphs) posted every 10 days. Now I'm trying to create a perl script so that it can go into each of these websites (there are a few like this one), file->print->pdf print with 50% scaling, and save all pdf files in my directory...... I haven't begun my task yet and am searching for the right language that can minimize the manual side of the job. I have been a perl user for some time and to me it's the best language for text processing, but I don't know too much about this aspect of perl. Does anyone know whether this is doable in perl, and recommend me where I should start? thanks.

Replies are listed 'Best First'.
Re: Is Perl capable of doing this?
by jdporter (Canon) on Apr 06, 2012 at 18:12 UTC

    If you want the resulting PDF to capture the look & layout of the web page, then I think you're talking about driving the browser, i.e. with Selenium. It's easy to drive the browser to go to certain sites, look for elements on the page, and so on; but I'm not sure how easy it is (or if it's even possible) to access browser features such as Print. Assuming it's possible, then probably what you'd want to do is configure the PDF virtual printer to be the default on your system, with the 50% scaling, default save location, and all that. Even then, there may be some GUI elements you have to access manually, so I'm not sure that this is fully automatable.

    An alternative approach might be to use a web service out there somewhere which converts web pages to PDFs. For example, there's PDFcrowd. However, with this one, it looks like the options you get for free don't meet your needs (scaling, for example). You could search around for others.

    Updated to fix some tpyos.

    I reckon we are the only monastery ever to have a dungeon stuffed with 16,000 zombies.
      he options you get for free don't meed your needs (scaling, for example).

      As a thought, every pdf viewer I've used, auto-scales for display.

      It might be better for the OP to capture full-sized and let the viewer choose what scaling is best, rather than doing the scaling up front.

      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?

      I haven't actually tried, but I think it may be possible because you can send key sequences using WWW::Selenium. If you've configured your defaults properly, it might be as simple as sending ctrl+p followed by an enter. Your script may have to go in and move/rename the resulting file afterwards.

Re: Is Perl capable of doing this?
by MidLifeXis (Monsignor) on Apr 06, 2012 at 18:26 UTC

    Would something along the lines of $sel->capture_entire_page_screenshot($filename, $kwargs) from WWW::Selenium be of use?


Re: Is Perl capable of doing this?
by zentara (Archbishop) on Apr 06, 2012 at 20:06 UTC
    You probably can use Gtk2::Webkit, as shown in Perl Web Browser using Gtk2 and WebKit. The problem with automating it is to detect when the page is fully loaded before taking your screenshot. To solve that problem, you can setup callbacks on certain available signals, as shown in Another YouTube Video Downloader using GtkWebkit:
    my $view = Gtk2::WebKit::WebView->new; $view->signal_connect( 'notify::progress' => \&notify_progress, undef +); $view->signal_connect( 'load_finished' => \&load_finished, undef );

    Gtk2 can make PDF screenshots.

    By the way, Gtk2::Webkit's capabilities have been greatly enhanced by the recent addition of Glib::Introspection. You might want to ask on the Perl/Gtk2 maillist on how to use Introspection, or whether it could help in your case.

    I'm not really a human, but I play one on earth.
    Old Perl Programmer Haiku ................... flash japh
Re: Is Perl capable of doing this?
by moritz (Cardinal) on Apr 06, 2012 at 20:36 UTC
Re: Is Perl capable of doing this?
by ww (Archbishop) on Apr 06, 2012 at 19:49 UTC
    Doable? With the right module, Perl can DO ANYTHINGNote1 ... even wash your windows!

    Note1 Sometimes, though, you have to write the module or even tweak the infrastructure.

Re: Is Perl capable of doing this?
by Anonymous Monk on Apr 07, 2012 at 00:04 UTC

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://963846]
Approved by Eliya
[Discipulus]: good morning nuns and monks!

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (4)
As of 2018-06-20 06:45 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (116 votes). Check out past polls.