Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re^3: POE Based web crawler or web spider ?

by rcaputo (Chaplain)
on Dec 03, 2012 at 20:40 UTC ( #1006947=note: print w/replies, xml ) Need Help??


in reply to Re^2: POE Based web crawler or web spider ?
in thread POE Based web crawler or web spider ?

I don't know how good it is. I just searched CPAN for your criteria. But things get better when we try them and tell their creators about both good and bad experiences. Or: ignoring modules helps CPAN stagnate.

Edit: Unfortunately this requires more time than many people have to complete their work. I don't have a good solution for that. Over the long term, it might be better to rate jobs higher if they allow enough time for planning and research. And to rate employees higher if they manage their time well enough to achieve these things.

  • Comment on Re^3: POE Based web crawler or web spider ?

Replies are listed 'Best First'.
Re^4: POE Based web crawler or web spider ?
by kulls (Hermit) on Dec 05, 2012 at 19:32 UTC

    I am just exploring Gungho in order to build the search engine from the scratch .
    I have the following questions.
    • 1. I don't have any place to look at all the possible options used for Gungho . Can you please suggest me a powerful crawler config file (.yml) so that I can reuse it.
    • 2. Is Gungho parse sitemap.xml as well as robots.txt operation ? If not, suggest me how can I do that ?
    • 3. Is it crawel multi level ( kind of nested sites ) ?
      Any suggestion on this ?

      This reads like a "do it for me" type post. Please do some research since this is your task. Basic research will answer your questions.

        Hi,
        Unfortunately, Gungho -h does not provide anything !! .
        Raja

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1006947]
help
Chatterbox?
[hippo]: Make it an option?
[Corion]: hippo: Yes, but I'm not sure that it's even worth the effort of implementing it at all...
[Corion]: You'll only ever need that option if you have a long-running query whose results are not cached by your DB already, and in those cases I presume that the programmer will want to maintain the temporary tables themselves - I wouldn't know when to drop ...
[Corion]: ... the temporary tables, and also don't have a good idea on how to create unique table names for them
[hippo]: OIC. In that case leave it out but invite feature requests and see if any of the users suggest it. :-)
[Corion]: Talking about this makes me realize that it's likely only a half useful idea. But it still would be convenient to have as an option...
[Corion]: hippo: Hmmm - yeah, I could document it and wait for code implementing that option to show up ;-D

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (10)
As of 2017-02-23 15:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Before electricity was invented, what was the Electric Eel called?






    Results (347 votes). Check out past polls.