Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^3: Building a Spidering Application

by Your Mother (Chancellor)
on Jul 09, 2012 at 15:25 UTC ( #980705=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Building a Spidering Application
in thread Building a Spidering Application

You don't need URI::ImpliedBase. WWW::Mechanize::Link objects that Mech uses/returns have a method, url_abs, to cover this. Of course then it's up to the spider to decide if query params are relevant or duplicates or no-ops and, in the hacky world of HTML4.9, if fragments are meaningful (but only JS aware Mech would be able to care here).


Comment on Re^3: Building a Spidering Application
Download Code
Re^4: Building a Spidering Application
by pemungkah (Priest) on Jul 09, 2012 at 16:17 UTC
    Thanks! I didn't know about that one. Last time I wrote a spider was maybe six years ago, and as I recall it wasn't there then - though I may have just missed it at the time. Handy for LWP folks still, I guess.

      I'm pretty sure you're right about it not being there at that time.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://980705]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (9)
As of 2015-07-04 20:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (60 votes), past polls