XP is just a number | |
PerlMonks |
Re^5: How to extract links from a webpage and store them in a mysql databaseby chargrill (Parson) |
on Dec 21, 2006 at 13:26 UTC ( [id://591090]=note: print w/replies, xml ) | Need Help?? |
And now a second bit of help, possibly a lot bigger of a bit than previously. I'm not familiar with HTML::LinkExtor, and I really don't use LWP::UserAgent these days either, so I wrote something taking advantage of my personal favorite for anything webpage related, WWW::Mechanize. I also never quite understood your original algorithm. If it were me (and in this case it is) I'd keep track of urls (and weeding out duplicates) for a given link depth on my own, in my own data structure, as opposed to inserting things into a database and fetching them back out to re-crawl them. I'm also not clear on your specs as to whether or not you want urls that are off-site. The logic for the way this program handles that is pretty clearly documented, so if it isn't to your spec, adjust it. Having said all that, here is a recursive link crawler. (Though now that I type out "recursive link crawler", I can't help but imagine that this hasn't been done before, and I'm certain a search would turn one up fairly quickly. Oh well.)
Inserting the links into a database is left as an exercise for the reader. --chargrill
In Section
Seekers of Perl Wisdom
|
|