Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

PMiltering fun

by Tanktalus (Canon)
on Jun 20, 2008 at 01:30 UTC ( #693040=perlmeditation: print w/replies, xml ) Need Help??

On my home network, I have a publically visible domain, complete with MX record sending email directly to me. I'm sure I'm not alone among monks that have the headache of 100s of spams hitting their mail server every minute of every hour, 24 hours a day, ... well, you get the idea. It doesn't stop.

Up until about 6 or 7 months ago, I had an old P2-233 running OS/2 which happily quashed most of the traffic with an OS/2 SMTP server that allowed filters written in any language to deal with inbound traffic while the server handled the protocol. Originally, my filter was in REXX, but eventually after I learned perl, I rewrote it. But that's not the point of this meditation, it's merely background. When the machine died (as in, I couldn't even turn the hardware on anymore), I moved the whole thing over to another box running Linux and Postfix. Unfortunately, I had no idea how to kill inbound email addressed to machines that didn't even exist.

So I didn't. I figured I'd get to it eventually. (Yes, I know, it was bad) That day was rushed when my ISP blocked outbound (but, thankfully, not inbound) anything on port 25, forcing all outbound email to go through their SMTP servers, which would then shut me down due to all the bounceback which they thought meant I had to be infected with a virus to be sending out that much email. So, I cleverly blocked my email server from outbound on 25 myself (at the router), and then wrote a cron-job that ran *every three minutes* to clear out the backlog of email that was no longer being able to be sent (reading the output from mailq, parsing it, and sending msg id's to "postsuper -d -" for deletion). It was a hack. And a bad one at that. But it worked.

Until a typo in my dns server worked around this. So I figured I really had to find a way to get postfix to stop even accepting bad emails. Unfortunately,I really have no idea how those postfix programs work - what is their API. AndI didn't notice anyone having solved this generically on CPAN. The next best thing appeared to be Sendmail::Milter as Postfix advertised Sendmail Milter compatibility. However, I read its ratings (http://cpanratings.perl.org++!!!) and decided that I didn't want to install sendmail to install this, so I checked out Sendmail::PMilter, a pure-perl implementation.

I quickly got to work:

#!/usr/bin/perl use strict; use warnings; use Sendmail::PMilter; my $conn = 'local:/var/run/mymilter.sock'; my $milter = Sendmail::PMilter->new(); $milter->setconn($conn); $milter->register('mymilter', {} Sendmail::PMilter::SMFI_CURR_ACTS ); $< = $> = getpwnam 'nobody'; $milter->main()

First thing - Postfix wasn't connecting. Without being able to waste a bunch of time here, I just changed $conn to 'inet:33333@127.0.0.1'. Problem solved - connections started happening.

So far, so good. So I went and added in my filter. In my case, I wanted toensure that the destination address for each email actually was a valid host name. So I had:

$milter->register('mymilter', { envrcpt => \&my_envrcpt_callback, }, Sendmail::PMilter::SMFI_CURR_ACTS ); openlog 'mymilter', 'pid', Unix::Syslog::LOG_MAIL();

and up top:

use Unix::Syslog qw(:macros :subs); sub is_valid_host { my $host = shift; return undef unless $host; my ($name,$aliases,$addrtype,$net) = gethostbyname($host); defined $name; } sub my_envrcpt_callback { my $ctx = shift; my $rcpt_addr = $ctx->getsymval('{rcpt_addr}'); my ($fqdn) = ($rcpt_addr =~ /\@(\S+)/); if (not is_valid_host($fqdn)) { # no such machine? syslog LOG_INFO, "$$: Rejected mail for $fqdn"; return Sendmail::PMilter::SMFIS_ACCEPT; } Sendmail::PMilter::SMFIS_ACCEPT; }

Seems pretty straight-forward, right? As you can see, I was accepting even the failures, but that was because I was testing still. Once I saw the syslog with correct entries, I changed that to SMFIS_REJECT, and suddenly email started dropping. Beautiful. However, there were still a few issues. First off, checking hosts is a network operation, and thus can be slow. Since these don't change often (it's a home network), I figured a bit of caching could help:

my %cache; sub is_valid_host { my $host = shift; return undef unless $host; $host = lc $host; unless (exists $cache{$host}) { my ($name,$aliases,$addrtype,$net) = gethostbyname($host); $cache{$host} = $name; } $cache{$host}; }

This sped things up noticeably. (Though if the dns server is on the same machine, this is probably negligible - I'm just not sure I'll always have the email and the dns on the same machine, and my testing of this code on a different machine showed this problem.) Normally, I don't bother with minor performance issues, but 100's of DNS queries per minute seemed like a candidate, even though the problem was larger in test than it would be in production.

The next issue was how the dozen or so child processes were being used. Sendmail::PMilter automatically spawned off subprocesses like any good daemon should, and I just wondered if I really needed them, and how they were being used. I had no idea. So I created a global $msg_count, set to 0, and in my_envrcpt_callback, simply did this:

++$msg_count; $0 = "[mymilter:$msg_count@" . scalar(localtime) . "] Checking + $fqdn ($rcpt_addr)";

Now I could see utilisation (or a reasonable facsimile thereof) via ps. I noticed that only about 3 or 4 processes were handling all the requests. This seemed odd to me. I tried one of Sendmail::PMilter's other dispatchers by setting $ENV{PMILTER_DISPATCHER} = 'prefork'; at the top of the script. Suddenly, the processes were being more evenly used. And fewer processes, mind you. It seemed also (from uptime) that I was using less CPU, so I stuck with it.

However, then I found that my milter stopped responding to Postfix after a while. I couldn't figure out why, so I added an "alarm 30" to the top of my_envrcpt_callback (if I stop hearing from postfix for 30 seconds, tell me, and reset each time I get an email to look at). And I added a global $ENV{ALRM} = sub { die "alarm\n" };.

And now this is what I have:

#!/usr/bin/perl use strict; use warnings; use Sendmail::PMilter; use Unix::Syslog qw(:macros :subs); my $conn = 'inet:33333@127.0.0.1'; $ENV{PMILTER_DISPATCHER} = 'prefork'; my %cache; sub is_valid_host { my $host = shift; return undef unless $host; $host = lc $host; unless (exists $cache{$host}) { my ($name,$aliases,$addrtype,$net) = gethostbyname($host); $cache{$host} = $name; } $cache{$host}; } $SIG{ALRM} = sub { die "alarm\n" }; { my $msg_count = 0; sub my_envrcpt_callback { my $ctx = shift; # if we're not used in 30 seconds, quit. alarm 30; my $rcpt_addr = $ctx->getsymval('{rcpt_addr}'); my ($fqdn) = ($rcpt_addr =~ /\@(\S+)/); ++$msg_count; $0 = "[mymilter:$msg_count@" . scalar(localtime) . "] Checking + $fqdn ($rcpt_addr)"; if (not is_valid_host($fqdn)) { # no such machine? syslog LOG_INFO, "$$: Rejected mail for $fqdn"; return Sendmail::PMilter::SMFIS_REJECT; #return Sendmail::PMilter::SMFIS_ACCEPT; } Sendmail::PMilter::SMFIS_ACCEPT; } } my $milter = Sendmail::PMilter->new(); $milter->setconn($conn); $milter->register('mymilter', { envrcpt => \&my_envrcpt_callback, }, Sendmail::PMilter::SMFI_CURR_ACTS ); openlog 'mymilter', 'pid', Unix::Syslog::LOG_MAIL(); $< = $> = getpwnam 'nobody'; syslog LOG_INFO, "Starting up: $$"; END { closelog } $milter->main(10,100);

And it seems to be working. My cron job is still running, but deals with a minor amount of email (addressed to a real host, but no user by that name there - gets rejected there, comes back to the mail server, but can't get back out dueto the block above) relatively speaking. I now have a bit of infrastructure set up such that I could find a way to validate the destination email address and reject immediately if I find the need/time. Not sure how, yet, but we'll see.

(PS - this was far too much for GrandFather's PMEdit to render and paste... *sigh* :-) )

Replies are listed 'Best First'.
Re: PMiltering fun
by rhesa (Vicar) on Jun 20, 2008 at 13:19 UTC
    Wouldn't this be far easier accomplished by using the mydestination setting? That's how you control which hosts you accept mail for.

    It's also a good idea to have a local caching resolver running anyway, especially if you have multiple machines in your local network. I use Bind, and have it forward unknown requests to my ISP. Those are then cached locally.

    forwarders { 1.2.3.4; 2.3.4.5; };

    I can also heartily recommend implementing dns blacklists in your smtp daemon. I'm very, very happy with zen.spamhaus.org, which drops about 75% of incoming spam. I also use some of the rfc-ignorant.org blacklists, but haven't really seen much benefit of it.

    Top it all off with bogofilter or another (bayesian) spam filter, and email life is good again. I see no spam in my inbox, and only have about 5 to 10 emails per day that the spam filter couldn't classify. I can live with those numbers!

      The mydestination setting is fine if your destination list doesn't change. But, IMO, it suffers from the data-duplicated-multiple-times syndrome. I already have this information in my DNS, duplicating it somewhere else seems like a huge waste of scarce resources (that being my ability to remember to do this should I change my network topology).

      I plan on inserting a spam filter, too, but last time I tried, email crawled to a halt because my poor machine couldn't keep up with it. This is kind of the first step in reclaiming that: by eliminating over 90% of the spam based on bad domain names, I will only need to check 10%. Even that will likely bring my P3-550 to a crawling halt, so I'm going to have to set up a distributed spam check (spamd running on another machine) somehow.

      Running a caching bind server on a small machine vs caching my own lookups... hmm... ;-) I suspect that for this machine, it's cheaper in both CPU and RAM to cache inside my milter.

      As for a rbl, I didn't really think of trying it until this. So thanks :-) (It makes me even more glad I posted this - I never would have imagined such a useful response, but I got it anyway.)

        There are settings for Postfix to only accept mail for domains for which it is the MX record. That would solve that problem. The mydestination setting isn't duplicated data, though, because I can easily set up a non-public email domain for testing purposes. There are provisions in RFC 2821 for delivering to a machine with an A record with no MX record, too.

        If you really want robust spam filtering in Perl, you could install amavisd-new as your MX-receiving SMTP server and forward mail that passes to Postfix. I recommend having a spam address and a ham address that amavis uses for Bayesian learning. Configure that anything coming from your Postfix outbound SMTP server to Amavis at those addresses gets processed accordingly, and then training your Bayesian filter is as simple as forwarding mail.

        The most successful anti-spam technique I've ever found, though, is to keep track of the number of invalid recipients from particular blocks of addresses, typically /24 blocks. You can measure in percentages of overall "RCPT TO" requests that fail, or a threshold of failed receipts per hour/day. Then, you can reject mail at the SMTP level from those blocks or, like I did, reject or drop packets with iptables or ipfilter from those blocks on your MX server. The configuration for either Postfix or iptables is easy to wrap in Perl. (So are amavis, shorewall, and more, of course). Be sure to have a list of exceptions, though, because you might not want to cut yourself off from AOL, Yahoo, and other public email sites (I couldn't, using this for a commercial ISP). AOL has a list of all the ranges their outgoing email servers use, though, so they're pretty easy.

        Dropping at the packet level does break a few RFCs, the one I can recall presently being the section of RFC 2821 that each domain and host that accepts or routes mail should have a reachable postmaster address despite filtering (which almost nobody follows anyway, since sending to "postmaster" then just becomes an easy way to spam). The really accepted way to do it, though, is to return a 554 policy error with text like "Your network block has been spamming this server."

Re: PMiltering fun
by dwm042 (Priest) on Jun 23, 2008 at 19:40 UTC
    Tanktalus,

    This being postfix, you can set it up to authenticate against either MySQL or LDAP, and one good way to limit spam would be to authenticate the recipient address against a known database of users able to accept email.

    If it doesn't exist in your authoritative database, throw it away. I'm saying this because the majority of spam I was seeing when I did email for a living (I was an email admin for a small telecom) were generated by dictionary attacks, and this is a simple, trouble free way to get rid of all that.

    If this is a terminal email acceptor (as opposed to a forwarder), then you might want to grab one of the formulas for building a amavisd-spamassasin-clamav setup as well. Just be warned that this software trio is a power hungry, not suitable for old hardware if you're seeing large volumes of stuff. High volume receivers keep this trio on separate hardware from the original MX acceptor.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://693040]
Approved by ikegami
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (5)
As of 2020-10-21 22:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My favourite web site is:












    Results (223 votes). Check out past polls.

    Notices?