Beefy Boxes and Bandwidth Generously Provided by pair Networks DiBona
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

The Ethics of Webbots

by Vautrin (Hermit)
on Sep 17, 2004 at 17:20 UTC ( #391844=perlmeditation: print w/ replies, xml ) Need Help??

I've got a project that would benefit from spidering and scraping a web site. Unfortunately, the web site I want to spider and scrape has very explicit TOS and robots.txt: the info I want is off limits. I want to cancel this project because of this, but management is insistent. Is it ethical to spider / scrape a site that says to stay away? Are there any possible legal ramifications? Should I tell my boss where to put his data in this job market?

.

Comment on The Ethics of Webbots
Re: The Ethics of Webbots
by gjb (Vicar) on Sep 17, 2004 at 17:34 UTC

    Yes, IMO this is unethical.

    You can run into trouble. Once I inadvertently violated Google's TOS and the ip-address was blacklisted. Unfortunately this happened to be the ip-address of the proxy that served the whole company I worked for. It took a letter with apologies to Google to get this corrected.

    Just my 2 cents, -gjb-

Re: The Ethics of Webbots
by tilly (Archbishop) on Sep 17, 2004 at 17:37 UTC
    Yes, there are possible legal ramifications to violating the ToS. Doubly so if you're going to engage in copyright infringement with the data that you scrape. The most likely consequence, though, is that they'll detect you and block your site and you will not be able to reach them at all.

    You're right that it is unethical to do this project.

    Whether it is worth your job is your choice. Two factors to consider though. The first is that my experience is that slimeballs are usually not just slimeballs in one way - how they want you to treat others is how they'll also treat you when push comes to shove. (A tip when you eat out with people. Watch how they treat the waiter/waitress. That tends to be very revealing about what those people are really like...)

    The second is that your practical ability to take a stand on principle strongly depends on your personal circumstances. If nobody depends on you and you have strong skills, then you can do it pretty safely. The current job market (see http://jobs.perl.org/) is reasonable. However if your background is weaker or if you have a family depending on you, then it becomes much harder to walk away from a currently paying job.

    My suggestion, without knowing your exact circumstances, is that if you cannot afford to take an immediate stand on principle, when an employer does something that you don't want to stand for, quietly start shopping your resume around. In fact even if you think that you can afford an immediate stand on principle, you may feel more comfortable making sure that you have a good fallback before burning any bridges.

Re: The Ethics of Webbots
by Albannach (Prior) on Sep 17, 2004 at 17:53 UTC
    As has already been said, this is clearly unethical, and perhaps can even be interpreted as illegal (depending on the legislation in place in your jurisdiction).

    I'd have to ask your boss why not just contact the owners of this site and look into an agreement with them? Maybe this has not occured to him at all.

    On the other hand maybe he already has done this but his offer was rejected, or they wanted more money than he wanted to spend. Either way, proceeding to scrape the site after failed negotiations would just give the other side more evidence against your side in court. You might want to point out the potential for legal costs to your boss.

    --
    I'd like to be able to assign to an luser

Re: The Ethics of Webbots
by jbware (Chaplain) on Sep 17, 2004 at 17:57 UTC
    Yeah, what gjb and tilly said. I've also found that bringing to light the legal implications of such a project can have some impact. I mean, how would your boss feel if it was their project that blocked the whole company from a crucial website, or brought a lawsuit (especially if they knew it was a possiblity). IANAL, but they seem to scare people pretty well when they know what they're doing is wrong. And as always, documentation is key. Your warnings in email can be a nice way to cover your back if/when the company looks for a scapegoat.

    -jbWare
Re: The Ethics of Webbots
by dragonchild (Archbishop) on Sep 17, 2004 at 18:47 UTC
    All the above advice is absolutely on-point. Some advice for when your boss doesn't listen:
    1. Get the request in writing, if you haven't already done so.
    2. Get your objections in writing, if you haven't already done so.
    3. Get his overruling of your objections in writing, if you haven't already done so.
    4. Go talk to HR, if you feel you can without endangering your job. Let them handle it.
    5. If all else fails, you can do one of two things:
      1. Contact the site you're being asked to scrape and let them know the position you're in.
      2. Do the job anyway while actively searching for another job.

    The idea is to have your ass completely covered. That's what the paper trail is for. You can term it as "I just want to know exactly what you want and be able to refer to it without bothering you." Signatures are best, but email is often good enough.

    I cannot emphasize this part strong enough - Make hardcopies of all email and photocopies of all signed documents, then store them offsite. These are your only protection when the company gets sued and the $h!t starts to roll downhill.

    If you're absolutely paranoid, you might even want to talk to a lawyer, just to make sure the jurisdiction you're in doesn't have some crazy laws that would skew the issue.

    ------
    We are the carpenters and bricklayers of the Information Age.

    Then there are Damian modules.... *sigh* ... that's not about being less-lazy -- that's about being on some really good drugs -- you know, there is no spoon. - flyingmoose

    I shouldn't have to say this, but any code, unless otherwise stated, is untested

      I'll add, from a personal work experience that ended up in a multi-million dollar lawsuit, that dragonchild's advice above is excellent and should be heeded.

      I'll second the above endorsement of dragonchild's method. This method has prevented many a lawsuit and/or grievance where I work, and has drastically shortened several others. You may also find that, once you put your objections - backed up by the ToS from the site - in writing, that the original request magically disappears. :-)

      Get it in writing. Get it in writing. Get it all in writing.

      --J

      This is good advice but remember that an illegal act is illegal regardless of who told you to do it. To take it to the extreme: Yes officer, I shot him but my boss told me to. I have a letter here prooving that I did not want to and that he overuled me...

      The paper trail may get you out of trouble inside your company, but would be an admission that you knew you were wrong if the other company got hold of it. Tread with care - stay legal.

      --tidiness is the memory loss of environmental mnemonics

      All of the parent poster's thoughts are valid, and I agree.

      A few additional thoughts, though:

      • Bring the ToS violation issue up with the objection "due to potential legal risk to the company, this should be run past the legal department." Cc: the legal department on that memo. That will come off as concern for the company and not a refusal to do the work. If your boss is angry, you can always claim "I'm looking out for both our jobs", and legal will likely agree. Not only that, it will CYA against being personally named if someone does sue, because you were acting on legal counsel.
      • Hardcopies of e-mail aren't enough. Bcc: yourself on all communication so all is properly threaded. Take snapshots of your mail folder daily, and digitally sign and encrypt them before moving them offsite. Cryptographically sign all of your outgoing messages. If possible, request your boss do the same.
      • If your boss inists on doing something you feel might be illegal (or at least likely to get you or the company sued), make an appointment with your legal department. They are required, in most places, to keep your identity sealed. Your boss is also unlikely to mess with them. :)
      • If the concern is ethical, make an appointment with your boss' direct manager. Explain that you don't want to get your boss in trouble, but explain that you "have a difference of opinion" over the ethics and ask his/her help in resolving it. Get the resolution in writing. Not only does that CYA to another level, if done tactfully it will be noted positively.
      Some of these actions are a small risk to your job if you have a vengeful boss. However, most of them allow for recourse of "wrongful termination", since you have a clear paper-trail showing that you were not refusing but raising ethical and legal concerns. Depending on where you live, you might be further protected by "whistle-blower" legislation.

      IANA Attorney, so please check with one if you're interested in what your precise rights under the law may be.

      --
      $me = rand($hacker{perl});

      All code, unless otherwise noted, is untested
      "All it will give you though, are headaches after headaches as it misinterprets your instructions in the most innovative yet useless ways." - Maypole and I - Tales from the Frontier of a Relationship (by Corion)
Re: The Ethics of Webbots
by Anonymous Monk on Sep 17, 2004 at 21:22 UTC
    It depends on what you do with it. I scrape lots of websites which have TOS against that, but its for my personal use only, and i'm just saving myself few clicks of the mouse. I'm not generating any extra traffic or doing anything maliciouse, so i don't consider it unethical (i'm just browsing).

    However, if I were to try and somehow make money with this data, I would consider it unethical (and IANAL, I'd probably be legally liable).

Re: The Ethics of Webbots
by toma (Vicar) on Sep 20, 2004 at 05:02 UTC
    Another approach to consider is to scrape the site manually. It would be a lot of clicks, but if the site is not too large or dynamic, this approach might get you out of a bad situation.

    If you have to do a lot of clicking, you can save your fingers from injury by using a footswitch instead of a mouse button. Take your shoe off, and click with your big toe. This motion should be the same as one of the motions involved in walking, and should not wear out the meat-ware. One type of footswitch is sometimes called a treadle.

    It should work perfectly the first time! - toma
Re: The Ethics of Webbots
by petdance (Parson) on Sep 21, 2004 at 03:11 UTC
    The bigger issue is your long-term employment. You need to evaluate your relationship with your boss, and with your company.

    What if your girlfriend said "Hey, let's go shoplifting" (assuming you're not Perry Farrell). You think "Do I want to maintain a relationship with this person who wants me to engage in illegal, unethical activities?" That's the situation you're in with your boss.

    Make no mistake, this issue is between you and your boss, not you and "management." Your boss should be protecting you from you having to do anything unethical or illegal. If he/she is not, then you have a shitty boss, and it may well be time to move on anyway.

    My bottom line is: no, you should do anything in the name of the company that violates your own code of ethics. Let 'em fire you for insubordination. They won't be eager to fire you, in general, nor will they relish the thought of you going to the unemployment office explaining why you were canned. Does your company have a code of ethics? A higher-up corporate parent you can talk to? My company has a code of ethics I have to sign yearly.

    I was in a situation like this before, at a different company, now out of business, when I was asked to write a program to create false sales reports to give to a supplier. I refused to do it, and was fully prepared to get canned. There were no repercussions, perhaps because my boss did the project instead of me.

    xoxo,
    Andy

      Sorry, but KUDOS to the Perry Farrell ref, petdance!!!

      Excellent advice by all - many thanks as I've been asked this same stuff before and have been able to be in the position to give a very flat, 'No.'
Re: The Ethics of Webbots
by poqui (Deacon) on Sep 21, 2004 at 16:55 UTC
    IANAL as well, but it has been my experience that knowledge of another person's intent to commit a crime, and not acting to stop that person, can be prosecuted as Complicity; which in some jurisdictions carries the identical sentence to committing the crime.
Re: The Ethics of Webbots
by inman (Curate) on Sep 22, 2004 at 08:24 UTC
    Your situation depends on two things. Firstly the relationship between the website owners and you. Secondly, what you do with the information.

    If you are scraping the website of a competitor, someone who derives an income stream from the information or someone who has already paid someone else for the content on their website then you probably won't get very far. It is also the case that if you are re-selling the information that you get or not attributing the source of the information then you would be in trouble (breech of copyright, passing off etc.).

    If, on the other hand, you are crawling a website that is otherwise in the public domain (e.g. government websites) then it may be worth getting in touch with the website owners and talking to them about it. Content owners are trying harder to provide machine to machine services such as web services, RSS feeds etc. A small licence fee later and you could end up with a web service feed rather than a web site scrape.

    As a general note from a technical point of view - be nice when you are scraping/indexing.

    • Index a site slowly - leaving a second or two between requests will take the pressure of the website (there's nothing like a DOS attack to annoy people). Varying the spacing in between calls also stops the requests clumping together in the website logs and drawing attention to your spider.
    • Set the Agent header in your requests to include a contact e-mail so that the website admin can get in touch with you.
    • Index a site only when necessary. The chances are that most of the site is staitic with only a small amount of info changing. The guiding principal is Less is more.

    In terms of your relationship with your boss. You need to document your concerns and the approach that you are taking. Remember that if you can demonstrate that they knew what was happening then they get it in the neck and not you.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://391844]
Approved by Albannach
Front-paged by Courage
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (9)
As of 2014-04-17 07:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (441 votes), past polls