Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

E-mail Redirect (for protecting addresses from E-mail-Address-Collecting Bots)

by davisagli (Scribe)
on Jun 25, 2001 at 21:59 UTC ( #91367=perlcraft: print w/replies, xml ) Need Help??

   1: #!/usr/bin/perl -wT
   2: 
   3: # E-mail Redirect (for protecting addresses from E-mail-Address-Collecting Bots)
   4: # by David Glick [davisagli], 6/25/2001
   5: 
   6: # when given an e-mail address in the form "[user],[domain]",
   7: # this script returns an HTTP redirect to "mailto:[user]@[domain]"
   8: 
   9: # This can be used to prevent spam-bots from finding e-mail
  10: # addresses in HTML links; for example, instead of linking to
  11: # "mailto:me@mydomain", you can link to "this_script.pl?me,mydomain"
  12: 
  13: # Comments/improvements welcome; I don't have much experience with CGI.
  14: 
  15: # Update 6/25/2001: The security risk that [bikeNomad] pointed out
  16: # shouldn't be an issue now.  Also implemented his other suggestions.
  17: # Thanks much, bikeNomad!
  18: 
  19: use strict;
  20: use warnings;
  21: use CGI qw/:standard/;
  22: 
  23: $_ = param('keywords');
  24: my ($user, $domain) = m{^([\w!$'*+-/=^.]+),([\w!$'*+-/=^.]+)$};
  25: print redirect( -uri => "mailto:$user\@$domain" )
  26:     if defined($user) && defined($domain);
  • Comment on E-mail Redirect (for protecting addresses from E-mail-Address-Collecting Bots)
  • Download Code

Replies are listed 'Best First'.
Re: E-mail Redirect (for protecting addresses from E-mail-Address-Collecting Bots)
by BooK (Curate) on Jun 25, 2001 at 23:08 UTC
(ichimunki) Re: E-mail Redirect (for protecting addresses from E-mail-Address-Collecting Bots)
by ichimunki (Priest) on Jun 25, 2001 at 22:41 UTC
    I wouldn't use this for security reasons (not to mention that it may not foil a decent spider because it does eventually produce the correct mailto: URL).

    Use the CGI interface to get the parameters from the URL rather than $ENV, especially since you're pulling in the module anyways (I see this is much better now).

    Use taint mode, just to be safe-- and I see that it's there, but you are untainting almost anything that might get passed in.

    Don't allow non-word characters in your input variables-- they aren't necessary in an email address are they?

    You don't even need to put the domain as "foo.com", just "foo" will be fine, then you can append ".com" in your script.

    Final thought: why even allow for input variables... this is the cause of the security problems. Why not just hardcode your own address into the script, so that the rest of us will not start pointing to your script for our own email addresses?
      Hi ichimunki,

      "Don't allow non-word characters in your input variables-- they aren't necessary in an email address are they?"

      ...is what you are asking, take a look at this: sender@registry-A.registry-1.organization-X which is cut from rfc822. I have seen emailaddresses containing numbers and underscores, too. Emailaddresses definitely don't have to consist out of word-elements.



      --
      there are no silly questions
      killerhippy
        You are absolutely correct. Any detainting regex should allow @ - and . which are non-word characters that are found in valid email addresses. But only in specific forms. Numbers and underscores will match \w in RE, so those don't need to be considered separately.
Re: E-mail Redirect (for protecting addresses from E-mail-Address-Collecting Bots)
by bikeNomad (Priest) on Jun 25, 2001 at 22:15 UTC
    Why would you possibly want all those funny characters in email addresses? I'm assuming that there is no encoding of the query string going on in the web server; if there is, you may have to deal with un-encoding the passed in text. And doesn't CGI have built-in routines that decode the passed parameters? Perhaps you should be using them instead. update: as tye points out, interpolating won't execute. Removed that part so as not to give bad advice.

      "mailto:$user\@$domain" gets parsed by Perl into 'mailto:'.$user.'@'.$domain so there isn't any difference between the two. Perl doesn't execute code in the case of "this `date` string", and even it did, $date='`date`'; "this $date string" still wouldn't execute code.

      My question on the original code is "What good does it do?" If a robot is prowling the web for e-mail addresses, why wouldn't it follow the link and get the e-mail address in this case? Do most of these robots skip links with "?" in the URL? Or do they not harvest e-mail addresses when given in redirects? Just curious...

              - tye (but my friends call me "Tye")
Re: E-mail Redirect (for protecting addresses from E-mail-Address-Collecting Bots)
by pope (Friar) on Jun 29, 2001 at 09:45 UTC
    That's not gonna work, davisagli.
    As tye already pointed out, a well-written robot will follow your redirect and eventually harvest the valid mailto: URL which is returned by your script. For example, the following snippet catches the email address:
    my $req = new HTTP::Request('GET', $url); my $res = LWP::UserAgent=>new->request($req); if ($res->is_success) { # do something here } elsif ($res->code == RC_BAD_REQUEST) { print "URL: ${\$res->previous->headers->{location}}\n"; } else { # do something else }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlcraft [id://91367]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (3)
As of 2021-12-03 16:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    R or B?



    Results (29 votes). Check out past polls.

    Notices?