Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re^2: [OT] Ethical and Legal Screen Scraping

by Your Mother (Canon)
on Jul 25, 2005 at 16:07 UTC ( #477879=note: print w/ replies, xml ) Need Help??


in reply to Re: [OT] Ethical and Legal Screen Scraping
in thread [OT] Ethical and Legal Screen Scraping

How can someone may forbid you to save a personal backup copy of a publicly available document?

This whole thing is a can of worms and it may not be sorted out clearly and legally for decades, however, copyright (in the US anyway) means the rights to all copies, which includes digital. So if a site's terms of service and copyright notice says you can't save a copy of the site for personal use, then you can't do it legally.

Who's to stop you? Probably no one. But just because something is technically possible doesn't suddenly make it ethical. It would be technically trivial for me to shoot out my neighbor's windows at 3am with a pellet gun from 250 yards and be all but assured of getting away with it. Technology, ability, doesn't change morality.

The more respect--and peer pressure for continued mutual respect--we have for these sorts of things, the more open it can all be and remain. The more the windows are broken, the more the government cops have excuses to climb in and "protect" us.


Comment on Re^2: [OT] Ethical and Legal Screen Scraping
Re^3: [OT] Ethical and Legal Screen Scraping
by radiantmatrix (Parson) on Jul 25, 2005 at 17:49 UTC

    Caching has been traditionally considered acceptible, legal, and ethical. Most web browsers do it on a small scale, and Google does it on a large scale. If the purpose of a scraper is to enable, for example, an "offline reader", there is no ethical dilemma.

    Even long-term caches that are not shared with others are not an ethical dilemma so long as the material on the web site being cached remains publicly available. Beyond that point, there is a dilemma, and one has to consider whether the good of continued access to that data outweighs the good of complying with the copyright-holder's wishes. I'd take that on a case-by-case basis.

    <-radiant.matrix->
    Larry Wall is Yoda: there is no try{} (ok, except in Perl6; way to ruin a joke, Larry! ;P)
    The Code that can be seen is not the true Code
    "In any sufficiently large group of people, most are idiots" - Kaa's Law

      Yep, I agree. Google in particular does exactly the right thing, I think. They allow you to control how they handle your content almost completely. You can exclude your pages from their caches while still including them in their search results.

      Magnetic or optical caches on disk are just a backup for my memory. It is not a lapse to remember information past the point that someone wishes you to remember their published information and is also not a lapse to remember this with the aid of a recording of it either.

      Just because I kept a copy of my hard drive, CD, printout, or in some hand written notes doesn't mean I'm obligated under any ethical system I'm aware of to destroy them just because my original source stopped publishing. The copyright holder's wishes are irrelevant.

        I happen to agree with your evaluation. However, if I try to evaluate things from a neutral ethical standpoint, I can see a potential dilemma in caching when material has been retracted from publication.

        However, I feel that the value of memory-augmentation, combined with the value of preserving information in case an author is pressured (legally or otherwise) into removing it from publication far outweighs any potential harm that might be percieved by copyright-holders.

        I also see it like ripping CDs I own to my harddisk - I am making an accessible copy of something I have a right to access. If my CD is destroyed, it is still ethically correct to keep my Vorbis files. Likewise, if content is destroyed from the 'net, I see no issue with maintaining the cache of it.

        <-radiant.matrix->
        Larry Wall is Yoda: there is no try{} (ok, except in Perl6; way to ruin a joke, Larry! ;P)
        The Code that can be seen is not the true Code
        "In any sufficiently large group of people, most are idiots" - Kaa's Law
Re^3: [OT] Ethical and Legal Screen Scraping
by willyyam (Priest) on Jul 27, 2005 at 13:04 UTC
    This whole thing is a can of worms and it may not be sorted out clearly and legally for decades, however, copyright (in the US anyway) means the rights to all copies, which includes digital. So if a site's terms of service and copyright notice says you can't save a copy of the site for personal use, then you can't do it legally.

    Not really true. Unless you explicitly agree to a TOS or EULA, (thus entering into a contract for the use of the material) a TOS cannot overturn fair use. Single copies made by individuals, for individual use, are well within fair use.

    The ruling regarding robots.txt is a bad ruling made by a judge who is trying to make the the best decision for the one instance before the court, not establish good law for all cases.

      I think you have been misled on the term, though it is a very slippery one that can be interpreted widely. Single copies made by individuals are not fair use. Fair use is brief, in relation to the whole, quotes or excerpts which aren't for direct gain and don't hurt the copyright holder.

      Somehow, if it only benefits one person and not a company, stealing has gained an air of common wisdom legality since Napster. If you buy something and make a copy for personal use, that's a different matter, depending on the mood of the court and the medium involved.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://477879]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (5)
As of 2014-09-02 07:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (20 votes), past polls