Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Limit submissions over time?

by deadbarnacle (Initiate)
on Jun 18, 2006 at 06:43 UTC ( #556055=perlquestion: print w/ replies, xml ) Need Help??
deadbarnacle has asked for the wisdom of the Perl Monks concerning the following question:

Hello all :)

I am new O_O

I'm using a bare-bones CGI web form mailer, which is quite open to abuse in its current incarnation. I'd like to add a few lines to the script that will limit people to one submission once every $TIME. I'd perhaps like to limit the length of submissions, but that's less important -- I don't care so much if "War and Peace" is sent to me, just as long as it's not sent 10,000 times :)

Also, if I just limit the number of submissions, it's easier for me to understand (perl newbie).

Here is my junk:

#!/usr/bin/perl use CGI; my $query = new CGI; print $query->header ( ); my $comments = $query->param("comments"); open ( MAIL, "| /usr/sbin/sendmail -t" ); print MAIL "To: melissa\@secretemail.com\n"; print MAIL "Subject: Form Submission\n\n"; print MAIL "$comments\n"; print MAIL "\n.\n"; close ( MAIL ); # barf out html message print <<END_HTML; <html> <p>thanks! This is what you sent me:</p> $comments </html> END_HTML


Thanks in advance!
--
m j teigen

Comment on Limit submissions over time?
Download Code
Re: Limit submissions over time?
by davido (Archbishop) on Jun 18, 2006 at 08:44 UTC

    Well, here are some of the challenges you'll face if you wish to limit how many times a particular individual is able to send you messages (in no particular order):

    • You cannot rely on environment variables to check IP's or domains. In some cases many users will appear to be from the same IP or domain. In other cases, some users' info simply won't be available. In still other cases, the info that is available can be spoofed or otherwise wrong. So rule CGI environment variables out as a means of 'authentication'.
    • You can't rely on cookies, unless you require that a cookie be present before a mail message can be sent. The cookie could contain a MD5 hash as identification that you keep track of for some period of time. This method would work, but would prevent access for folks who have cookies turned off.
    • You could require a login, but that means maintaining user lists which adds complexity and might be inconvenient enough for people that they won't send a message in the first place.
    • Even if you do prevent an individual from posting multiple times, you may still be leaving the door opened to a many-source DOS attack, where a large number of "bad" machines gang up on you at once.

    Every practical and reliable means of preventing abuse has trade-offs manifesting as reduced convenience and/or reduced compatibility for the end users, while at the same time increasing complexity for your script.

    At least, you probably ought to look into the CGI::Session module, which could facilitate adding session management to your script. You might also find it helpful to buy, borrow, or check out at the library a copy of "CGI Programing with Perl" (O'Reilly & Associates) 2nd edition. It dedicates a lot of discussion to subjects such as email, and session management. It's a good read, IMHO. Also, don't do mail by hand. Use a module such as Mime::Lite, for its simplicity, reliability, and robustness.


    Dave

      Well, you can rely on environmental variables to check the IP of the machine connecting to the server, as it's set by your local webserver. (assuming you trust your local webserver, that is.) Yes, there are issues, but I don't think it's worth ruling them out -- for authentication yes, not it can still be used for authorization, if you know where the problems are.

      HTTP_ADDR is very reliable. However, the problem comes that it might not be the IP for the machine that the person is connecting from.

      Many proxies will also set X_FORWARDED_FOR, but they're not required to, and those IP addresses aren't necesarily routable, which means that a collision in non-routable space may not be a collision for different proxy servers.

      If you're just looking for _some_ sort of rate throtling (ie, better than nothing at all), I'd use a combination of HTTP_ADDR, and X_FORWARDED_FOR. I'd probably not worry about the issues with non-routable colisions, and keep track of the following:

      if ( defined $ENV{'X_FORWARDED_FOR'} ) { &track(':'.$ENV{'X_FORWARDED_FOR'}); &track{$ENV{'HOST_ADDR'}.':'.$ENV{'X_FORWARDED_FOR'}); } else { &track{$ENV{'HOST_ADDR'}); }

      (specific tracking code depends on what you're planning, how much memory you have, and what other resources (ie, database), you have available.)

      Now, let's look at the flaw in my plan -- anyone can send whatever they want in X_FORWARDED_FOR, which would suggest they're a proxy server, and you'd not be rate limiting them if they put something random in it. (it's possible that the original poster would want to rate limit proxy servers at some smaller interval, just to keep the 10,000 possibility down).

      Personally, I'd just impose extra sleep for those times of collisions in the case of a proxy -- if you slow it down to one every 30 seconds or so, it makes it less likely that it'll get abused. (and remember that in whatever tracking system you're using, log at the time that it comes in, but set the timestamp to the time that it's expected to run, so if something else comes in while it's sleeping, it won't just wait ($time), it'll wait $time past the current one finishing.

      Just remember -- anything you can do will never making spamming impossible. You juat need to make things harder on the spammer so they'll try somewhere else -- hopefully, without imposing too much of a burder on your legitimate users.

        You know, another strategy might be to do some sort of a diff calculation on incoming mail, and if it appears to be, within a certain tolerance for error, approximately equal to one of the past five messages you received, block it.

        This could be made even more secure if you also implement session management (CGI::Session, for example), and even more secure if you also require logins. But again each level of additional protection means additional assumptions about the end user, and/or additional hoops for the end user to jump through.


        Dave

Re: Limit submissions over time?
by hesco (Deacon) on Jun 18, 2006 at 09:07 UTC
    Davido offers sage advise above. The first thing that screamed at me was that you are handling cgi parameters without the safety net of use taint. Taint would tell you to turn off a bunch of environmental variables before you start opening file handles on pipes.

    If you assume that "a user" = "a machine with a cookie", CGI::Session could be your ticket. Expire your sessions after $TIME, set the cookie as you send the email and don't permit another to be sent until cookie expires.

    Of course this can be easily defeated by deleting the cookies on a machine and proceeding to abuse this mechanism.

    -- Hugh

    if( $lal && $lol ) { $life++; }
Re: Limit submissions over time?
by virtualsue (Vicar) on Jun 18, 2006 at 16:40 UTC
    You might find the scripts here to be of interest, since you are a relative newbie doing web programming. The NMS Project was organized and is supported by several contributors to this site who have quite a lot of practical acquaintance with CGI programming with Perl. The form-to-email task in particular is one that has been performed many many times, so you might want to see how somebody with experience has done it.
Re: Limit submissions over time?
by TedPride (Priest) on Jun 19, 2006 at 05:07 UTC
    Actually, IP addresses are a very good way to limit spam. Sure, a determined person can hide his IP address from you by using intermediaries, but even this has its limits, and the same determined person can kill your site much more easily using a variety of other methods. What you're trying to prevent is not the l33t hacker types, but rather your average spammer, who rarely bothers to hide his IP address.

    What you do is keep a database of:

    ID number (always include this)
    IP address (stored as 3 bytes corresponding to the first 3 sections of the IP, not the IP's text representation)
    IP address index (for fastest access)
    Last time accessed (timestamp)
    Bad access count (defined as the number of times accessed within a certain time period of the last time accessed)

    Basically, you look for a record in your database with the IP address of the user you want to check. If it's there, you check to see if the bad access count is over the set limit. If it is, you exit. If not, you check the last time accessed. If the last time accessed is too recent, update the bad access count and exit if the new count is too high. Update the last time accessed and proceed.

    Oh, and the exit procedure might include something for adding that IP to the global site ban list (see Apache .htaccess IP bans), in which case you'll also want to remove the banned IP from your table so it's not cluttering things up. And you'll run an automatic procedure every so often to remove records from your table that have a last accessed time of more than x days ago, so the table remains small and efficient.

    Your average spammer will be able to spam maybe two or three times (depending on how you set your limits), then get cut off automatically after that, with little server-side processing time involved. People who spam by mistake won't get blocked, since your regular clean-up run will clear out their bad access count every x days.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://556055]
Approved by willyyam
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (11)
As of 2014-10-21 06:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (97 votes), past polls