Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

WWW::Scripter performance

by GaijinPunch (Pilgrim)
on Nov 07, 2011 at 02:40 UTC ( [id://936386]=perlquestion: print w/replies, xml ) Need Help??

GaijinPunch has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks

I'm going to keep this rather general. I've got WWW::Scripter working which slurps a few pages for me. There's some rather nasty Javascript on at least the first page which is detecting the client (and throwing a captcha out if it doesn't like you). Scripter alleviates this, but with a big caveat: performance is really slow. On my remote server this can take a good 60 seconds to parse through this script. The script itself is about 165k so it is not trivial. Clearly a parser is the right answer, rather than attempting to figure out what it does and reimpliment the logic in the script..

Does this sound like I'm doing something wrong, or is this expected behavior? Any of the following pages detect if Javascript is enabled (via running a javascript), so my one idea of using Scripter for the first page and simply sharing the cookie with Mechanize for subsequent pages won't work. I've tried using both the Javascript & Ajax plugins. They seem to offer me about the same in terms of speed.

Any pointers, or even a "yeah, Scripter can be slow" would be appreciated.

Replies are listed 'Best First'.
Re: WWW::Scripter performance
by Anonymous Monk on Nov 07, 2011 at 03:02 UTC

    Does this sound like I'm doing something wrong, or is this expected behavior?

    You are not doing anything wrong

    This is expected, it says This is still an unfinished work. There are probably scores of bugs crawling all over the place.

    Mozilla, Microsoft ... have been working on their browsers for 20 years, Scripter is from 5 April, 2009 (early start in 17 July, 2007)

Re: WWW::Scripter performance
by hardburn (Abbot) on Nov 07, 2011 at 14:59 UTC

    Yeah, Scripter can be slow.

    As AnonMonk pointed out, it's a work in progress. Last time I tried it (a year or so ago), it didn't even do the most basic caching of JavaScript and fetched it from the server every time. That's besides the fact that any pure-Perl JavaScript implementation is going to be really slow, and the SpiderMonkey plugin is almost unusable.

    It's great that this exists, but it obviously needs work.


    "There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni.

      I got sidetracked yesterday and couldn't reply to this. Thanks guys... guess I'll have to bite the bullet and crunch the script myself if I want it to get any better. For now, it's livable. :D

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://936386]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (3)
As of 2024-04-25 17:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found