Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

WWW::Scripter performance and warnings

by bretelle (Initiate)
on Dec 12, 2013 at 14:26 UTC ( #1066851=perlquestion: print w/ replies, xml ) Need Help??
bretelle has asked for the wisdom of the Perl Monks concerning the following question:

Bonjour tout le monde,

in this piece of script

use WWW::Scripter; ($w = new WWW::Scripter)->use_plugin(JavaScript); $w->get('http://www.immoweb.be/FR/Rent.Estate.cfm?IdBien=2805206&xpage +=1');
The get() function gives the expected results but it takes more than 60 seconds and produces these warnings:
Argument "\x{b}\x{31}" isn't numeric in addition (+) at /usr/local/sha +re/perl/5.14.2/JE/Number.pm line 93. Unquoted string "inf" may clash with future reserved word at (eval 646 +2) line 2. Unquoted string "inf" may clash with future reserved word at (eval 663 +7) line 2. Unquoted string "inf" may clash with future reserved word at (eval 664 +0) line 2.
... which I've not been able to interpret so far. Using get() with other URLs produces other kind of warnings. I know Scripter can be very slow (cf #936386) but maybe if I manage to understand the warnings I'll be able to find a workaround?

Comment on WWW::Scripter performance and warnings
Select or Download Code
Re: WWW::Scripter performance and warnings
by Laurent_R (Parson) on Dec 12, 2013 at 15:46 UTC

    Bonjour,

    Maybe you could try with this syntax:

    use WWW::Scripter; $w = new WWW::Scripter; $w->use_plugin('JavaScript');
    Especially the single quotes around the word JavaScript may be important.
      Bonjour et merci,

      Using this syntax doesn't seem to help. The get() function still takes ages and I get the warning about Argument "\x{b}\x{31}".

Re: WWW::Scripter performance and warnings
by PerlSufi (Friar) on Dec 12, 2013 at 15:57 UTC
    I'm not totally clear on why you are using WWW::Scripter as opposed to WWW::Mechanize- except that WWW::Mechanize doesn't work with javascript very well. However, I was able to connect to that page just by doing the following:
    use WWW::Mechanize; use strict; use warnings; my $mech = WWW::Mechanize->new(); $mech->get('http://www.immoweb.be/FR/Rent.Estate.cfm?IdBien=2805206&xp +age=1'); $mech->dump_text;
    Additionally, if you want to crawl the site AND use some of the javascript, you can either:
    A) use WWW::Mechanize::Firefox
    B) inspect the various html elements of the page with Firefox's firebug extension and use the $mech->get() similar to what I did above
    UPDATE:
    C) Or Go with Laurent_R's response ;)
      Hi PerlSufi,

      one element I need in the web page is printed there by a Javascript script. So I need WWW::Scripter (or something else) to execute this script.

      I guess Scripter is slow because there is a lot of Javascript in this page.

      When I inspect this particular element with Firebug I can see the name of the script. The question now is whether I can use that information in my Perl script so that WWW::Scripter executes only that one script. I'll first have to try to understand a bit more about Javascript and WWW::Scripter.

      Thanks a lot for your answers

      UPDATE I also tried solution A, WWW::Mechanize::Firefox, which does the work OK but is not faster, more than one minute to perform get().

        I guess Scripter is slow because there is a lot of Javascript in this page.

        Or infinite loop, memory leaks .... even the browsers (firefox/chrome...) do very little to protect from this

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1066851]
Approved by Corion
Front-paged by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (7)
As of 2014-12-18 01:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (41 votes), past polls