Your skill will accomplish what the force of many cannot |
|
PerlMonks |
comment on |
( [id://3333]=superdoc: print w/replies, xml ) | Need Help?? |
This is impossible to determine from the client side.
Suppose you are playing a text adventure, and you find
yourself in a maze. All rooms have the same description.
Just based on the description, you do not know whether
you have been there before or not. And even if you remember
all the pages, and say "if two pages have the same content,
I consider them to be the same, even if the URLs differ",
you can have a problem - for instance, the page may contain a
'counter' or a timestamp, making that the content is different
each time.
You might be able to come up with some heuristics, but then you will have to accept that you will have false positives and false negatives. And make sure you check a sites robots.txt - that should prevent a spider from getting into a loop. Off course, your question has nothing to do with Perl. You'd have to solve the same problems if you'd used any other language. Abigail In reply to Re: Infinite loop prevention for spider
by Abigail-II
|
|