Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

This seems to be a really hot topic...

First, the article by Mr. Salzenberg alludes to a Federal law requiring spiders to obey the robots.txt file, but he unfortunately fails to cite to that law. It isn't even clear whether he is referring to statute or case law, the distinction being crucial.

I think it's clear that if someone has troubled to create a robots.txt file, he or she intends that spiders follow it. True - unless directories are password protected, people or spiders can access them. However, that doesn't make it legal or ethical to violate the request of the robots.txt file.

To my way of thinking, "ethics" is more or less applying the Golden Rule, or not doing to people what you would find objectionable. It also includes not doing what *they* find objectionable, within reason. In this case, even if I don't mind spiders sucking up my Website, being ethical would require that I not do this to others, if they ask me not to.

The question of what a Webmaster intended is simple to resolve - ask him or her. People do things for various reasons. Some might be trying to protect their Websites from Googlebot, but have no objection to you scraping it. Others might object to anyone using bandwidth unless there is a human being doing it - perhaps to view the ads on the site, or for other imponderable reasons. So, to me, the best way to resolve the matter is to ask the Webmaster.

As for the law, I doubt very much that anything most of us do will ever come to the attention of the authorities, unless someone sucks up a whole commercial Website and presents it as their own. Even then, the most likely result would be a sternly-worded "cease and desist" letter from Boyd, Dewey, Cheetham, & Howe, LLC.

I think that ethics is (are?) a personal issue where opinions are likely to vary widely. Even if everyone tries to abide by the Golden Rule, people are so widely divergent in their tastes that there is likely to be much disagreement. This is one of the reasons why laws are made - to enforce what is usually an unsatisfactory compromise.

In reply to Re: [OT] Ethical and Legal Screen Scraping by spiritway
in thread [OT] Ethical and Legal Screen Scraping by eyepopslikeamosquito

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others making s'mores by the fire in the courtyard of the Monastery: (6)
    As of 2020-03-30 20:11 GMT
    Find Nodes?
      Voting Booth?
      To "Disagree to disagree" means to:

      Results (176 votes). Check out past polls.