Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
You seem to be confused about ethics.

Behaving ethically is not defined by anyone else's ability to prove that you did or did not behave ethically. If you do something unethical, it is unethical whether or not you're caught. If you find yourself having to make excuses for your behaviour, the odds are very good that you're behaving unethically.

If you write a program to scrape another website, even if it is just for personal use, courtesy and ethics says that you should pay attention to robots.txt. Sure, you can do by hand anything that you automate. But when you automate you're likely to do a lot more of it than when you do it by hand. And you're likely to do it a lot faster. This has implications for the website that you're visiting, and it makes sense that website operators would ask you to be particularly polite to them.

If you say, "Oh, this is just for personal use" and turn a poorly written spider loose on a site, you're being rude and unethical. The website operator may well choose to repay your rudeness in kind and block you. They don't even have to go to court to do it either - they just notice that you're a bandwidth hog and lock you out.

But you asked several hypothetical questions. Here are not so hypothetical answers. Someone might recognize that you didn't just use your browser because of the speed with which you hit the site, because of your user agent, because they get access to your computer and find the program that you used to do it. There are other things that might strike them as suspicious, but the above is a good starting list.


In reply to Re^2: [OT] Ethical and Legal Screen Scraping by tilly
in thread [OT] Ethical and Legal Screen Scraping by eyepopslikeamosquito

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others studying the Monastery: (4)
    As of 2020-03-31 02:41 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?
      To "Disagree to disagree" means to:









      Results (179 votes). Check out past polls.

      Notices?