Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
Your flip comment about relativist ethics hits a sore point for me. Moral relativism does not absolve you from having a system of morals. And if you really believe it does, then depending on details you're a psychopath or a sociopath and I'd prefer that you be a long ways away from me (preferably in jail). For more on that read this post by me on another site explaining my views in more detail.

That said, I admit that different people will have different views of what is or is not ethical. And, as your examples illustrate, there are plenty of uses of an automated agent that most (including me) would agree justify ignoring robots.txt. But you have to think about what you're doing and why.

However at least one of your examples is questionable. Suppose that you're downloading 1-10 MB in short spans of time, and the machine that you're hitting is a public webserver hosted on someone's personal machine. A machine whose bandwidth is no better than your own. While you might not think that that's an issue, the webserver operator may not agree. Nor may other users. Nor may other people who are hosted on that machine. This applies for both personal webservers and also many small businesses. You may not know enough to determine whether this is an issue - but the website owner's opinion is right there in robots.txt.

Suppose, to take another example of yours, that you are scraping a pay-only portion of a site for personal use. Even though you believe that your use is ethical, the website owner has no way of knowing that. The website owner may or may not have had you click through an agreement that you won't use automated bots. Now what you're doing may be illegal (you are violating a contract) and is of questionable morality despite your justifications (you are breaking your word). Furthermore you run a real risk of having the website owner notice what you're doing and block you. (Whether or not there is an agreement.)

This doesn't just happen at pay sites. I know of multiple people who have found themselves shut out of Google after testing a bot there.

Now I'll agree that most websites don't monitor things that closely. As a practical matter, plenty of people ignore robots.txt and don't get caught. But I still think that if you are writing a bot, then you should either pay attention to robots.txt or have a good reason not to.

In reply to Re^4: [OT] Ethical and Legal Screen Scraping (and courtesy) by tilly
in thread [OT] Ethical and Legal Screen Scraping by eyepopslikeamosquito

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others drinking their drinks and smoking their pipes about the Monastery: (5)
    As of 2020-05-26 10:46 GMT
    Find Nodes?
      Voting Booth?
      If programming languages were movie genres, Perl would be:

      Results (150 votes). Check out past polls.