Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re^4: POE Based web crawler or web spider ?

by kulls (Hermit)
on Dec 05, 2012 at 19:32 UTC ( #1007373=note: print w/ replies, xml ) Need Help??


in reply to Re^3: POE Based web crawler or web spider ?
in thread POE Based web crawler or web spider ?


I am just exploring Gungho in order to build the search engine from the scratch .
I have the following questions.

  • 1. I don't have any place to look at all the possible options used for Gungho . Can you please suggest me a powerful crawler config file (.yml) so that I can reuse it.
  • 2. Is Gungho parse sitemap.xml as well as robots.txt operation ? If not, suggest me how can I do that ?
  • 3. Is it crawel multi level ( kind of nested sites ) ?
    Any suggestion on this ?


Comment on Re^4: POE Based web crawler or web spider ?
Re^5: POE Based web crawler or web spider ?
by marto (Chancellor) on Dec 05, 2012 at 19:37 UTC

    This reads like a "do it for me" type post. Please do some research since this is your task. Basic research will answer your questions.

      Hi,
      Unfortunately, Gungho -h does not provide anything !! .
      Raja

        kulls:

        If the extent of your research ability is trying '-h' on a command line, perhaps you should investigate a career in the fast food industry?

        ...roboticus

        When your only tool is a hammer, all problems look like your thumb.

        Sorry, you called it with -h, I hadn't realised the vast extent of your research. If you think this is an acceptable amount of work I suggest changing careerer/course ASAP. You've been a member here since 2005, you've posted hundreds of times. At some point you're actually going to have to do some work of your own. If you can't work out what Gungho does based on the resources available, what is your next step going to be?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1007373]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (6)
As of 2014-09-19 10:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (134 votes), past polls