Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine

Re: POE Based web crawler or web spider ?

by rcaputo (Chaplain)
on Dec 03, 2012 at 20:03 UTC ( #1006939=note: print w/replies, xml ) Need Help??

in reply to POE Based web crawler or web spider ?

How robust/involved do you need it to be? Gungho includes a POE engine. The first of the POE Cookbook Web Client Recipes could be expanded into a small-scale crawler.

  • Comment on Re: POE Based web crawler or web spider ?

Replies are listed 'Best First'.
Re^2: POE Based web crawler or web spider ?
by kulls (Hermit) on Dec 03, 2012 at 20:09 UTC
    Thanks for the quick reply.
    Gungho waw written way back 2008 and seems like very few interaction.
    Should I go for it ?

      I don't know how good it is. I just searched CPAN for your criteria. But things get better when we try them and tell their creators about both good and bad experiences. Or: ignoring modules helps CPAN stagnate.

      Edit: Unfortunately this requires more time than many people have to complete their work. I don't have a good solution for that. Over the long term, it might be better to rate jobs higher if they allow enough time for planning and research. And to rate employees higher if they manage their time well enough to achieve these things.

        I am just exploring Gungho in order to build the search engine from the scratch .
        I have the following questions.
        • 1. I don't have any place to look at all the possible options used for Gungho . Can you please suggest me a powerful crawler config file (.yml) so that I can reuse it.
        • 2. Is Gungho parse sitemap.xml as well as robots.txt operation ? If not, suggest me how can I do that ?
        • 3. Is it crawel multi level ( kind of nested sites ) ?
          Any suggestion on this ?

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1006939]
[shmem]: LanX: you can pass a shell command via @ARGV.
[shmem]: moritz: don't know, but you'd prolly whip it up in no time.

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (11)
As of 2018-05-21 13:17 GMT
Find Nodes?
    Voting Booth?