Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

WWW::Scripter performance and warnings

by bretelle (Initiate)
on Dec 12, 2013 at 14:26 UTC ( #1066851=perlquestion: print w/ replies, xml ) Need Help??
bretelle has asked for the wisdom of the Perl Monks concerning the following question:

Bonjour tout le monde,

in this piece of script

use WWW::Scripter; ($w = new WWW::Scripter)->use_plugin(JavaScript); $w->get('http://www.immoweb.be/FR/Rent.Estate.cfm?IdBien=2805206&xpage +=1');
The get() function gives the expected results but it takes more than 60 seconds and produces these warnings:
Argument "\x{b}\x{31}" isn't numeric in addition (+) at /usr/local/sha +re/perl/5.14.2/JE/Number.pm line 93. Unquoted string "inf" may clash with future reserved word at (eval 646 +2) line 2. Unquoted string "inf" may clash with future reserved word at (eval 663 +7) line 2. Unquoted string "inf" may clash with future reserved word at (eval 664 +0) line 2.
... which I've not been able to interpret so far. Using get() with other URLs produces other kind of warnings. I know Scripter can be very slow (cf #936386) but maybe if I manage to understand the warnings I'll be able to find a workaround?

Comment on WWW::Scripter performance and warnings
Select or Download Code
Re: WWW::Scripter performance and warnings
by Laurent_R (Prior) on Dec 12, 2013 at 15:46 UTC

    Bonjour,

    Maybe you could try with this syntax:

    use WWW::Scripter; $w = new WWW::Scripter; $w->use_plugin('JavaScript');
    Especially the single quotes around the word JavaScript may be important.
      Bonjour et merci,

      Using this syntax doesn't seem to help. The get() function still takes ages and I get the warning about Argument "\x{b}\x{31}".

Re: WWW::Scripter performance and warnings
by PerlSufi (Friar) on Dec 12, 2013 at 15:57 UTC
    I'm not totally clear on why you are using WWW::Scripter as opposed to WWW::Mechanize- except that WWW::Mechanize doesn't work with javascript very well. However, I was able to connect to that page just by doing the following:
    use WWW::Mechanize; use strict; use warnings; my $mech = WWW::Mechanize->new(); $mech->get('http://www.immoweb.be/FR/Rent.Estate.cfm?IdBien=2805206&xp +age=1'); $mech->dump_text;
    Additionally, if you want to crawl the site AND use some of the javascript, you can either:
    A) use WWW::Mechanize::Firefox
    B) inspect the various html elements of the page with Firefox's firebug extension and use the $mech->get() similar to what I did above
    UPDATE:
    C) Or Go with Laurent_R's response ;)
      Hi PerlSufi,

      one element I need in the web page is printed there by a Javascript script. So I need WWW::Scripter (or something else) to execute this script.

      I guess Scripter is slow because there is a lot of Javascript in this page.

      When I inspect this particular element with Firebug I can see the name of the script. The question now is whether I can use that information in my Perl script so that WWW::Scripter executes only that one script. I'll first have to try to understand a bit more about Javascript and WWW::Scripter.

      Thanks a lot for your answers

      UPDATE I also tried solution A, WWW::Mechanize::Firefox, which does the work OK but is not faster, more than one minute to perform get().

        I guess Scripter is slow because there is a lot of Javascript in this page.

        Or infinite loop, memory leaks .... even the browsers (firefox/chrome...) do very little to protect from this

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1066851]
Approved by Corion
Front-paged by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (3)
As of 2015-07-06 00:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (68 votes), past polls