Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re^2: How to extract links from a webpage and store them in a mysql database

by syedahmed.uos (Novice)
on Dec 05, 2006 at 14:34 UTC ( [id://587878]=note: print w/replies, xml ) Need Help??


in reply to Re: How to extract links from a webpage and store them in a mysql database
in thread How to extract links from a webpage and store them in a mysql database

This node falls below the community's threshold of quality. You may see it by logging in.
  • Comment on Re^2: How to extract links from a webpage and store them in a mysql database

Replies are listed 'Best First'.
Re^3: How to extract links from a webpage and store them in a mysql database
by g0n (Priest) on Dec 06, 2006 at 12:36 UTC
    Step one is probably to write an algorithm to do what you want. Something like this perhaps:

    • Create your database table with columns for 'link', 'depth', 'read'
    • read the first page and store the base URL
    • for each link in the page, compare its base to the original base URL
    • If they match, add to the DB with depth 2 and read 'no'
    • For each entry in the table where read eq 'no', read the page, set read to 'yes', compare each link base to the original base URL
    • If they match, add to the db with depth 3 and read 'no'
    • repeat the last two steps, setting depth to 4 (i.e. a link found at depth 3)
    • end
    You could end when you don't find any entries in the db with depth <=3 and read eq 'no', that way it's easy to modify if you decide to read deeper.

    --------------------------------------------------------------

    "If there is such a phenomenon as absolute evil, it consists in treating another human being as a thing."
    John Brunner, "The Shockwave Rider".

    A reply falls below the community's threshold of quality. You may see it by logging in.
    A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://587878]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (6)
As of 2024-04-24 09:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found