As suggested by Abigail-II in Re: Web Robot,
a polite robot should:
- Obey robots.txt.
- Don't flood a site.
- Don't republish, especially not anything that might be copyrighted.
- Abide by the site terms of service.
I was further interested to learn in
Chip Salzenberg's letter at geeksunite
"Federal courts have upheld that web spiders must obey
the established robots.txt mechanism by which web site owners
limit automated access and that a failure to obey robots.txt
However, I'm confused about who robots.txt is intended for.
I understand robots.txt applies to heavy duty web spiders and
indexers, such as a Google robot. But does it also apply to
little screen scraping tools written by private individuals?
For example, suppose I write a little tool using
(rather than LWP::RobotUA or WWW::Mechanize::Polite ?, say) that simply collects
a number of web pages for me while I sleep.
Is it illegal or unethical for such a scraper to ignore robots.txt?
If a commercial company sells a tool that allows non-programmer
end users to write little screen scraping robots, is it unethical
or illegal for such a product to not provide a mechanism to allow
their end users to respect robots.txt?
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.
| & || & |
| < || < |
| > || > |
| [ || [ |
| ] || ] ||