Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

A SpamAssassin-Enabled POP3 Proxy

by McD (Chaplain)
on Aug 27, 2002 at 01:56 UTC ( #193065=CUFP: print w/ replies, xml ) Need Help??

I dig SpamAssassin - a Perl based spam filtering suite.

It's somewhat biased towards a Unix flavored mail server soloution though, and I read my mail at home on Win32 with a POP3 mail client. Itch, meet scratch.

This is a simple POP3 protocol proxy which will intercept mail messages en route from a POP3 server and wash them thru SpamAssassin before returning them to the mail client (Mozilla, Outlook, Eudora, etc). It's feature rich (for what it is - it's pretty simple at the end of the day) but it does require some know-how to get set up. There's also a slew of cool SpamAssassin features that it doesn't support, but might someday. It was written and debugged with Win32 use in mind, but I suspect the code is pretty portable to any modern Perl install - let me know if you have success on a non-Win32 platform with it!

There's a manpage with detailed instructions on installing, configuring, and using this proxy here, and here's a nice zip with the script and related goodies: pop3proxy.zip

The code is posted below - it's the same as the zip, but I've removed the POD directives for brevity's sake. See the manpage if you plan on setting about using this.

Peace,
-McD

#!perl -w # Pop3proxy - a SpamAssassin enabled POP3 proxy designed for Win32 # users. use strict; # Set this to zero to turn off all debugging statements. Set to 1 for # basic debugging, which is pretty verbose, set it to 2 to add a dump # of key data structs on connect, set it to 3 to add a dump of every # read/write we do. (Oy) use constant DEBUGGING => 1; # Seems that SpamAssassin wants to remove the dependency on # Time::HiRes. I only need it for measuring performance, so I'll only # include it if it's available. Have to eval the "use constant" # statements to avoid redefinition warnings. # # I use constants for debugging switches because I believe they get # optimized out by the compiler if they're false. I could be wrong. BEGIN { eval "use Time::HiRes"; if ($@) { eval "use constant TIMERS => 0"; } else { eval "use constant TIMERS => 1"; } } # A set of enumerated reasons why we're snarfing a multiline response # for a socket. use constant RETR => 1; use constant TOP => 2; use constant CAPA => 3; use IO::Socket; use IO::Select; use FindBin; use Mail::SpamAssassin; use Getopt::Long; ######################### # A BUNCH OF EVIL GLOBALS ######################### # Set this to be a file that will contain the debug log. Set to an # empty string to debug to STDOUT. --logfile command line arg sets. my $logfile = 'pop3proxy.log'; # Hostmap - keys are ports to listen to on localhost, values are # hostname:port to proxy connections on the key port to. Set up by # the command line --host arg or by the hostmap.txt config file. # # A simplest case - you get your mail from a server server named # pophost.isp.com, on the standard POP3 port (110): # # my %hostmap = ( 110 => 'pophost.isp.com:110' ); # # ...And you change your mail client to get mail from localhost. # # Fancier case - you pop mail off of two hosts, pophost.isp.com and # mail.yetanother.org: # # my %hostmap = ( # 817 => 'pophost.isp.com:110', # 818 => 'mail.yetanother.org:110', # ) # # In that case, the proxy listens to TWO sockets on localhost - 817, # and 818, proxying off to two separate remote hosts as indicated. # # Note that for this to work, you need to be able to tell your mail # client to connect to two different ports on localhost to find the # proxy - namely, 817 for pophost.isp.com and 818 for # mail.yetanother.org. Some mail clients, like Netscape 4.5's, won't # let you specify the port to use for a pop3 connection. Oops. # Others, like Mozilla 1.0, will let you set the port, but won't allow # two servers to be on the same host (localhost in this case). You # can work around THAT by creating another alias for localhost in your # C:\Windows\Hosts file: # # 127.0.0.1 localhost MyHostName # # ...and then configuring one account for localhost:817 and the other # for MyHostName:818 my %hostmap = (); # Respect_byte_count - If TRUE, then we do not alter the byte count of # the message when marking it as spam - instead, we overwrite portions # of the headers, such as changing the first five characters of the # Subject: line to "*SPAM*" (a shortened form of SpamAssassin's famous # subject prefix). Set by the command line --nopad arguement. # # This, because under certain conditions the POP3 protocol indicates # message and mailbox sizes, and the safe thing is not to enlarge # those sizes while marking a message as spam. # # If there is no Subject: line in the mail headers (there doesn't have # to be, after all) or if it's less than 5 bytes, then we use the # first Received: line we find instead. # # Setting this value to FALSE (0) seems to work with most mail # clients, and it causes us to proxy back the mail as it's been # modified by SpamAssassin, which gives you a wonderful great lot of # info about WHY it's labeled as spam, and also labels it clearly and # beyond doubt, and defangs the MIME contents, etc, etc - but it # *could* break the mail client. Harumph. my $respect_byte_count = 0; # If true, we let the POP3 "TOP" command go thru to the server, # otherwise, we don't proxy the TOP command and return an error back # to the client. Set by the command line --allowtop arguement. # # TOP is specified as an optional command, it shows you the headers of # a mail message and a configurable number of lines of the body. The # idea is that you can sort of "screen" what you choose to download or # not before you do. All well and good, but our spam filtering can # cause this to break when we scan the actual message during retrieval # and potentially modify or add to the headers, such as changing the # subject line to start with *****SPAM***** or something. # # This breaks the protocol a little and could have unusual or possibly # even destructive consequences. Since it's an optional part of the # protocol, most mail clients should be coded to work without it, # hence, by default, we avoid the problem by turning it off. my $allow_top = 0; # Here's the problem with using SpamAssassin in this way - given a # large enough message, he will take a LONG time to scan it, where # long is like sixteen minutes on a P-II 350 running Linux for a 3MB # text message. Maybe that was a degenerate case of some sort, but # there it is. If SpamAssassin takes long enough to scan a message, # the mail client (who's not getting any data in response to his RETR # command during all this) will eventually time out. Sockets close, # data is lost, etc, etc. Very bad, very difficult to fix and get on # with your life if you have a large mail message on the server that # keeps causing this. # # Hence, this config parameter. If a message exceeds this size while # we're snarfing it, we'll abandon the snarf, start passing the data # back to the client, and no scan of the message by SpamAssassin will # be performed. # # Setting this to zero turns this behavior off - all messages will be # scanned, regardless of size. # # I chose a 250K default for this value after analyzing a few months # worth of spam - 1500 messages. The average size was about 9K, the # largest was 110K. I figured double the largest would allow most of # the spam we see today to get scanned, without trouble. # # This has the added side effect of keeping our memory usage down - # that scan of a 3MB message took 86MB worth of memory. That's not # such a hot idea for a daemon. my $max_scan_size = 250000; # If we're invoked with a logfile for output using ActiveState's # wperl.exe, we can effectively hum along in the background. Nice. I # don't want to send the user to Task Manager to shut us down, and # under Win98 at least you get the nasty "application not responding" # dialog box because I'm busy waiting for to select a socket, so # instead we have this - a port that we listen on for the purposes of # exiting. Any connection to it from localhost, and I'll get out of # town. # # The default is 9625 (which is otherwise unused). Set this to zero # to disable this behavior. my $exit_port = 9625; # Note CRLF == \015\012 my $no_top = "-ERR Not supported by proxy\015\012"; # %peer - mapping of client socket => server socket, and vice versa. # # Keys are stringified references to IO::Socket objects, values are # actual references to the same. It's a little ugly to contemplate, # but it works just dandy. # # The Peer mapping is removed when the peer is closed. Thus, if # you're reading data on $socket: # # The destination of this data is $reading_buf{$peer{$socket}}, and, # If there is no destination any more, there's no point in reading the # data, so shut down, and, # If you read some data, add the $peer{$socket} to the Writeable set, # because now you want to write something to him. # # And, if you're writing to $socket, # # The data is in $writing_buf{$socket}, and, # Once all the data is written, you should close $socket if # $peer{$socket} is missing. my %peer; # %is_client - stringified IO::Socket references for keys, true or # false values based on whether that socket is connected to a client # or the server. my %is_client; ################# # Buffers galore. ################# # The general flow of data is: # # data from $socket -> $peer = $peer{$socket} -> read data into # $reading_buf{$peer} -> hook protocol, snarfing to $message{$peer} if # needed -> move data into $writing_buf{$peer} -> write data to $peer # %reading_buf - keys are sockets, value is buffer of data read from t +hat # socket's peer, waiting to be proxy'd to the socket. my %reading_buf; # %writing_buf - keys are still sockets, value is data from the # %reading_buf buffer which is now ready for writing to the socket. my %writing_buf; # Hash of socket => buffer, buffer is filled up with the message being # snarfed. Then the buffer is scanned and modified, then copied into # $writing_buf{$socket} and flushed back to the client. my %message; # Hash of socket => enums, set to the reason we're snarfing a # multiline response into %message_for array for this socket. Set to # zero (false) if we're NOT snarfing this data. my %snarfing; # Hash of Client socket => queue of commands the client has requested. my %client_commands; # Hash of listening sockets - keys are stringified socket object refs, # values are the host:port we should proxy connections on that socket # to. my %proxyto; # Flags - toggled on and off to indicate if we're reading a multiline # response or not. Keys are sockets. my %reading_multiline_response; # Hash - keys are sockets, values are HiRes timer floats. Used to # time downloads. my %snarf_start; ######## # "Main" ######## # Get in your directory chdir "$FindBin::RealBin"; read_config() if -s "./hostmap.txt"; my $cl_proxyto; my $helpflag = 0; usage() unless GetOptions("logfile:s" => \$logfile, "nopad" => \$respect_byte_count, "allowtop" => \$allow_top, "maxscan=i" => \$max_scan_size, "exitport=i" => \$exit_port, "host=s" => \$cl_proxyto, "help" => \$helpflag, ); usage() if $helpflag; if ($cl_proxyto) { warn "WARNING: $cl_proxyto overrides hostmap.txt entry: $hostmap{110 +}\n" if exists $hostmap{110}; # We're nice to command line users. If you tag a :port onto your # hostname, that's cool, otherwise, you get :110 for free. $cl_proxyto .= ':110' unless $cl_proxyto =~ /:\d+$/; $hostmap{110} = $cl_proxyto; } die "No proxy host! Use --host or hostmap.txt\n" unless keys %hostmap +; # Prevent concurrent proxies - kill any previous instance if (IO::Socket::INET->new(PeerAddr => 'localhost', PeerPort => $exit_port, Proto => "tcp", Type => SOCK_STREAM)) { warn "WARNING: Existing proxy killed\n"; } if ($logfile) { # Redirect stdout and stderr to logfile if specified. # Windows strangeness - you can't reopen STDOUT/STDERR successfully # under wperl.exe unless you've already closed it. Go figure. close STDOUT; close STDERR; open(STDOUT, "> $logfile") or die "Can't redirect stdout: $!"; open(STDERR, ">&STDOUT") or die "Can't dup stdout: $!"; } $| = 1; # The SpamAssassin scanner. # # We tell it to use ./user_prefs, and not to try to copy in a default # if it's not there (because he goes looking for a template file in # all the usual places to copy over - all the usual *UNIX* places). # We tell it to run only local tests, because otherwise, you'll get # complaints and timeouts when it can't find a dcc app to run or it # can't find a DNS server or the network is the wrong color or # whatever. In theory, you could get away with DNS RBL checks, but # I've had a hard time making Net::DNS work on my Win32 setup. # Hackers welcome, best of luck. See notes at end of file. my $spamtest = Mail::SpamAssassin->new({ userprefs_filename => './user_prefs', dont_copy_prefs => 1, local_tests_only => 1, }); my $readable = IO::Select->new; my $writeable = IO::Select->new; # Create sockets to listen on. foreach my $port (keys %hostmap) { my $listener = IO::Socket::INET->new(LocalPort => $port, Listen => 5 +, Reuse => 1); die "Can't create socket for listening: $!" unless $listener; print "Listening for connections on port $port (proxy $hostmap{$port +})\n" if DEBUGGING; $readable->add($listener); $proxyto{$listener} = $hostmap{$port}; } # Create the "exit socket" - any connection on this socket from # localhost will cause us to exit. my $exit_socket; if ($exit_port) { $exit_socket = IO::Socket::INET->new(LocalPort => $exit_port, Listen + => 1, Reuse => 1); $readable->add($exit_socket); } while(1) { my ($toread, $towrite) = IO::Select->select($readable, $writeable); foreach my $socket (@$toread) { if ($socket == $exit_socket) { all_done($socket); next; # Just in case it wasn't from localhost } # Is it a new connection? if (exists $proxyto{$socket}) { dump_data_structs() if (DEBUGGING > 1); # Open connection to remote, add to readable set, map it # to this new client connection. my $remote = IO::Socket::INET->new(PeerAddr=>$proxyto{$socket}); $readable->add($remote) if $remote; if (not $remote) { # Break the incoming new client off, create a new # listener to try again. print "Connect to remote: $proxyto{$socket} FAILED: $@\n" if D +EBUGGING; my $port = $socket->sockport; $socket->close; $readable->remove($socket); my $listener = IO::Socket::INET->new(LocalPort => $port, Listen => 5, Reuse => 1); die "Can't create socket for listening: $!" unless $listener; $readable->add($listener); $proxyto{$listener} = $hostmap{$port}; next; } # Accept the connection and add it to our readable list. my $new_sock = $socket->accept; $readable->add($new_sock) if $new_sock; die "Can't create new socket for incoming connection: $!" unless $new_sock; # Create proxy/peer mapping, set client/server indicators, # create buffers, etc. $peer{$new_sock} = $remote; $peer{$remote} = $new_sock; $is_client{$new_sock} = 1; $is_client{$remote} = 0; $message{$new_sock} = ''; $snarfing{$new_sock} = 0; # The first thing we'll see is a response to no command at # all - "+OK Welcome to foobar.com" - so we seed the # command queue with a dummy command to eleminate warnings # later on. $client_commands{$new_sock} = [('none')]; foreach ($new_sock, $remote) { $reading_buf{$_} = ''; $writing_buf{$_} = ''; } if (DEBUGGING) { print "\nNew connection:\n"; print "From: ", $new_sock->peerhost, ':', $new_sock->peerport,"\n"; print "To: ", $remote->peerhost, ':', $remote->peerport, "\n"; } } else { # It's an established connection my $key; if (DEBUGGING) { if ($socket->connected) { $key = $socket->peerhost . ':' . $socket->peerport; } else { $key = "$socket"; } } my $proxy; # Which socket we're going to proxy this data to if (exists $peer{$socket}) { $proxy = $peer{$socket}; } else { # No peer. print "\n$key - peer gone on read" if DEBUGGING; # No need to keep hearing about how it's ready to be # read - we've got no use for subsequent data. $readable->remove($socket); # Tear down connection, unless there's data waiting to # be written to it - in that case, we'll catch it in # writeables and close it when we're done. if (! data_waiting($socket)) { print ", nothing to write, closing socket" if DEBUGGING; clean_up($socket); } print "\n" if DEBUGGING; next; } # Why 4096 bytes? I dunno. You got a better buffer size? unless (my $n = sysread($socket, $reading_buf{$proxy}, 4096, length($reading_buf{$proxy}))) { warn "sysread: $!\n" if not defined $n; # Shut down the socket print "\n$key - socket close on read" if DEBUGGING; clean_up($socket); # Remove the proxy map delete $peer{$socket}; delete $peer{$proxy}; if (! data_waiting($proxy)) { # No pending data - tear down the peer as well. print ", closing peer too" if DEBUGGING; clean_up($proxy); } print "\n" if DEBUGGING; next; } if (DEBUGGING > 2) { $is_client{$socket} ? print "C< " : print "S< "; print "\n"; } # Got data from a socket. Go do something clever with it. run_hooks($proxy); } } # End of readables # Next, do something with each socket ready to write. Like, write # to it. foreach my $socket (@$towrite) { my $key; if (DEBUGGING) { if ($socket->connected) { $key = $socket->peerhost . ':' . $socket->peerport; } else { $key = "$socket"; } } my $wrote = syswrite($socket, $writing_buf{$socket}) or do { warn "syswrite: $!\n"; print "\n$key - socket close on write" if DEBUGGING; clean_up($socket); # Remove the proxy map if (exists $peer{$socket}) { my $proxy = $peer{$socket}; delete $peer{$proxy}; delete $peer{$socket}; if (! data_waiting($proxy)) { print ", closing peer too" if DEBUGGING; clean_up($proxy); } } print "\n" if DEBUGGING; next; }; if (DEBUGGING > 2) { $is_client{$socket} ? print "C> " : print "S> "; print "\n"; } # Scrub the just-written data from the buffer substr($writing_buf{$socket}, 0, $wrote, ""); # All done writing? if (! length($writing_buf{$socket})) { $writeable->remove($socket); if (! exists $peer{$socket}) { # No peer? Tear down connection. print "\n$key - peer gone after write, closing\n" if DEBUGGING +; clean_up($socket); next; } } } # end of writeables } # data_waiting($socket) # # Returns true if there's any data waiting to be proxy'd to this socke +t. # # Reason this works - we only check data_waiting() on a socket *after* # we've closed it's peer. Closing the peer in clean_up(), below, will # have the effect of flushing any pending %message buffers (and # %reading_buf, for that matter) to %writing_buf, and hence, all the # data which is "waiting" is, in fact, guaranteed to now be waiting. sub data_waiting { my $socket = shift; return (length($reading_buf{$socket}) or length($writing_buf{$socket +})); } # clean_up($socket) # # Given a socket, close it, stop selecting it for anything, clean up # all our structs that refer to it, set the peer if any to flush # buffers. sub clean_up { my $socket = shift; # This socket is history. If there's a peer, then that peer # currently has all the data it's ever gonna get. Flush that data # into the writing_buf and add it to the writeable set. # # Ok, technically, this *could* burn you if what you were caching # away in %message was a multiline TOP response that you were # going to discard anyway, and now I'm going to flush it to the # client, instead. Look, the client is going to get an error # condition *anyway* because the darn socket is GONE, man, just # like that, in the middle of a multiline response! I will # venture to say that no harm will come of this - but if it does, # we can always make this behave a lot more like a "last ditch" # run_hooks() session. if (exists $peer{$socket}) { my $proxy = $peer{$socket}; $writing_buf{$proxy} .= $message{$proxy} if exists ($message{$prox +y}); $writing_buf{$proxy} .= $reading_buf{$proxy}; $reading_buf{$proxy} = ''; $message{$proxy} = ''; $snarfing{$proxy} = 0; if (length ($writing_buf{$proxy})) { $writeable->add($proxy); print "\nFlushing peer on close\n" if DEBUGGING; } } # Note that you can apparently remove a socket more than once from # an IO::Select set. Also you can delete a key/value pair from a # hash that doesn't exist. Love Perl. DWIM. $readable->remove($socket); $writeable->remove($socket); $socket->close; delete $reading_buf{$socket}; delete $writing_buf{$socket}; delete $is_client{$socket}; delete $snarfing{$socket}; delete $message{$socket}; delete $client_commands{$socket}; delete $reading_multiline_response{$socket}; delete $snarf_start{$socket}; } # run_hooks($socket) # # This is where we hook the POP3 protocol. Called whenever a socket # gets new data in it's buffer, we can do whatever you want here. The # default is to wait until there's a \n in the %reading_buf buffer, th +en (in # a loop) move all those bytes into the %writing_buf buffer (giving us + the # window to look at a full line of I/O), then add the socket to the # writeable set, thereby causing the contents of %writing_buf to get # flushed to the socket. # # Under certain conditions, though, we'll want to intercept the # protocol, at which point we snarf the data off into %message until # it's done, then we look at it or replace it or something, and THEN # we ship it off to %writing_buf for flushing to the client. # # Client commands are pushed onto a queue of commands, server # responses shift commands off that queue. This way we can support # pipelining client/servers, per rfc 2449 # # Note - logically, the %peer mapping must be intact when you get # here. The main loop enforces this. You may assume that # $peer{$socket} will exist and be valid in this routine. my $pos; sub run_hooks { my $socket = shift; # This loop looks for the first occurance of a \n in a string, # then MOVES all of the string up to and including the \n into the # output buffer and adds the socket to the set of sockets we'd # like to write to. Then it loops looking for another \n. # # Just before the move, you can examine the beginning of # $reading_buf{$socket} to see what kinds of interesting thingies mi +ght # be in there, in the confidence that it's a real full line of # data from the protocol. You can say things like: # # $reading_buf{$socket} =~ /^(.*)$/m # /m lets $ match next to embe +dded \n $pos = -1; while (($pos = index($reading_buf{$socket}, "\012", 0)) > -1) { # Right here you can examine $reading_buf{$socket} if ($is_client{$socket}) { # Hooks here for data from the server to the client # Responses from the server are interesting. They can be # single line, in which case they MUST start with "+OK" or # "-ERR", or else they're part of a multiline response, # such as a LIST or RETR command, in which case they MUST # end with a CRLF.CRLF. if ($reading_buf{$socket} =~ /^(\+OK|-ERR)/i and not $reading_multiline_response{$socket}) { # Response to a command my $command = shift @{$client_commands{$socket}}; print $peer{$socket}->peerhost . ':' . $peer{$socket}->peerport . " (Server) said $1 to $command\n" if DEBUGGING; # Always include the greeting line in the log. if (DEBUGGING and $command eq 'none') { print $reading_buf{$socket}; } die "Assertion failed: snarfing outside multiline response" if ($snarfing{$socket}); # Only interested in snarfing successful response - # none of the error responses are multiline. if (substr ($1, 0, 1) eq '+') { if ($command =~ /^TOP$/i and not $allow_top) { print "Snarfing TOP response\n" if DEBUGGING; $snarfing{$socket} = TOP; } if ($command =~ /RETR/i) { print "Snarfing RETR response\n" if DEBUGGING; $snarf_start{$socket} = Time::HiRes::gettimeofday if TIMERS; $snarfing{$socket} = RETR; } if ($command =~ /CAPA/i) { print "Snarfing CAPA response\n" if DEBUGGING; $snarfing{$socket} = CAPA; } } } elsif ($reading_buf{$socket} =~ m|^\.\015?\012|) { # End of a multiline response $reading_multiline_response{$socket} = 0; if ($snarfing{$socket}) { print "Detected end of snarfed multiline\n" if DEBUGGING; printf "Download took %.8f seconds\n", Time::HiRes::gettimeofday - $snarf_start{$socket} if (DEBUGGING and TIMERS); # At this point, $message{$socket} contains the # full multiline response, +OK up to but not # including this trailing ".CRLF". if ($snarfing{$socket} == RETR) { # Right here, $message{$socket} is ripe for # scanning. scan_mail(\$message{$socket}); $writing_buf{$socket} .= $message{$socket}; } elsif ($snarfing{$socket} == TOP) { # Eat the .CRLF, add the error message to the # output buffer, flush said output buffer, # clean up your structs and move on. substr($reading_buf{$socket}, 0, $pos+1, ""); $writing_buf{$socket} .= $no_top; $message{$socket} = ''; $snarfing{$socket} = 0; $writeable->add($socket); next; } elsif ($snarfing{$socket} == CAPA) { # Strips out the TOP response, if any. $message{$socket} =~ s/\012TOP[^\012]*\012/\012/ig if not $allow_top; # Strips out the SASL response, if any. $message{$socket} =~ s/\012SASL[^\012]*\012/\012/ig; $writing_buf{$socket} .= $message{$socket}; } $message{$socket} = ''; $snarfing{$socket} = 0; } } else { # Part of a multiline response. Flip the ready flag, # you won't be ready to see another response until you # see your CRLF.CRLF $reading_multiline_response{$socket} = 1; } # At this point, snarf data into %message if snarfing and # move along. if ($snarfing{$socket}) { $message{$socket} .= substr($reading_buf{$socket}, 0, $pos+1, ""); # Check size of snarfed message and stop snarfing if it's # getting too big - see notes at $max_scan_size. if ($max_scan_size != 0 and length($message{$socket}) > $max_scan_size) { print "Message exceeding max scan size, abandoning snarf\n" if DEBUGGING; $writing_buf{$socket} .= $message{$socket}; $message{$socket} = ''; $snarfing{$socket} = 0; $writeable->add($socket); } next; } } else { # Hooks here for data from the client to the server # Spot the client's command, add to the queue. my ($command) = $reading_buf{$socket} =~ /^(\S+)\s/; print $peer{$socket}->peerhost . ':' . $peer{$socket}->peerport +. " (Client) said $command\n" if DEBUGGING and $command; # AUTH is a special case, see discussion elsewhere. Must # not have any commands in the queue, and we reply back to # the socket immediately with an error. if ($command and $command =~ /^AUTH$/i) { if (scalar(@{$client_commands{$peer{$socket}}})) { die "I so can't cope with AUTH commands while pipelining"; } print "AUTH Rejected\n" if DEBUGGING; substr($reading_buf{$socket}, 0, $pos+1, ""); # Note - $no_top is a generic -ERR response, works fine. $writing_buf{$peer{$socket}} .= $no_top; $writeable->add($peer{$socket}); next; } push (@{$client_commands{$peer{$socket}}}, $command) if $command +; } # Default action after all your shots at hooking and magic, # etc.: Move the data to the writing buffer, and set it up to # get written. $writing_buf{$socket} .= substr($reading_buf{$socket}, 0, $pos+1, +""); $writeable->add($socket); } } sub dump_data_structs { # Dump your current key per-connection data structs print "\nExisting proxy/peer mappings:\n"; print map "$_ => $peer{$_}\n", keys %peer; print "\nExisting is_client flags:\n"; print map "$_ => $is_client{$_}\n", keys %is_client; print "Existing socket reading_buf buffers:\n"; print map "$_ => $reading_buf{$_}\n", keys %reading_buf; print "Existing socket writing_buf buffers:\n"; print map "$_ => $writing_buf{$_}\n", keys %writing_buf; print "Existing message buffers:\n"; print map "$_ => $message{$_}\n", keys %message; print "Existing snarfing flags:\n"; print map "$_ => $snarfing{$_}\n", keys %snarfing; print "Existing command queues:\n"; print map "$_ => @{$client_commands{$_}}\n", keys %client_commands; print "Existing reading_multiline_response flags:\n"; print map "$_ => $reading_multiline_response{$_}\n", keys %reading_multiline_response; print "Existing snarf_start values:\n"; print map "$_ => $snarf_start{$_}\n", keys %snarf_start; } # @mail - array of lines of a mail message. Some notes on memory # usage here: # # Big mail messages getting copied about will chew up memory right # quick. I start with one copy of the message built up in a scalar # buffer, then I need a second copy, broken out into an array of # lines, for Mail::SpamAssassin::NoMailAudit to chew on. That's two # copies. # # I can save a copy's worth of memory by MOVING the lines from the # scalar buffer into the array - but then, once SpamAssassin is done # chewing on them, I have to put them BACK into the scalar buffer. If # I'm not removing them from the SpamAssassin::NoMailAudit object as I # do that, I'm going to wind up with a second copy of the mail # *anyway*. And that kind of removal is nasty and creeps inside of # the objects encapsulation, where I really ought not go. # # NoMailAudit::as_string() returns a copy of the mail as a string, but # to do so, it creates a big ol' scalar on the stack to return. # Simple, but it costs a THIRD chunk of memory the size of the # message. my @mail; sub scan_mail { my $mailref = shift; my $bytecount = length $$mailref; $$mailref =~ s/\012\.\./\012\./g; # un-byte-stuff @mail = split /^/, $$mailref; my $response = shift @mail; # SpamAssassin::NoMailAudit adds a Unix mbox From_ line, unless # you construct your NoMailAudit message with the (ahem, # undocumented) add_From_line param set to false. That From_ # kinda breaks the protocol - the client isn't expecting mbox, # he's expecting raw 822 mail - so we leave it out. my $message = Mail::SpamAssassin::NoMailAudit->new(data => \@mail, add_From_line => +0); my $start; $start = Time::HiRes::gettimeofday if TIMERS; my $status = $spamtest->check($message); printf "Spam check took %.8f seconds\n", Time::HiRes::gettimeofday - $start if (DEBUGGING and TIMERS); my $id = $message->get('Message-id') || '*none*'; print "$bytecount bytes, ", $status->is_spam() ? 'SPAM' : 'NOT spam', ", Message-id: $id\n" if DEBUGGING; print $status->get_report() if DEBUGGING and $respect_byte_count; $status->rewrite_mail() unless $respect_byte_count; if ($status->is_spam ()) { if ($respect_byte_count) { # DAN - danger, you don't know if you're in the headers or not. $$mailref =~ s/\012Subject: [^\012]{6}/\012Subject: *SPAM*/i or $$mailref =~ s/\012Received: [^\012]{6}/\012Received: *SPAM* +/i; } else { # What as_string() does as of SpamAssassin v2.31: # return join ('', $self->get_all_headers()) . "\n" . # join ('', @{$self->get_body()}); $$mailref = $response; $$mailref .= $message->get_all_headers(); $$mailref .= "\015\012"; foreach my $line (@{$message->get_body()}) { $$mailref .= $line; } # SA's markups end with \n instead of CRLF's. Gotta # change those here. $$mailref =~ s|(?<!\015)\012|\015\012|g; } } else { if (not $respect_byte_count) { $$mailref = $response; $$mailref .= $message->get_all_headers(); $$mailref .= "\015\012"; foreach my $line (@{$message->get_body()}) { $$mailref .= $line; } # SA's markups end with \n instead of CRLF's. Gotta # change those here. $$mailref =~ s|(?<!\015)\012|\015\012|g; } } $status->finish(); $$mailref =~ s/\012\./\012\.\./g; # byte-stuff } sub all_done { my $socket = shift; my $new_sock = $socket->accept; if ($new_sock->peerhost eq '127.0.0.1') { print "Connection on exit socket, exiting\n" if DEBUGGING; exit; } else { print "Connection on exit socket from non-local host!\n" if DEBUGG +ING; $new_sock->close; } } sub read_config { open (CONFIG, "./hostmap.txt") or die "Can't read hostmap.txt: $!\n" +; # Straight from the cookbook 8.16 while (<CONFIG>) { chomp; # no newline s/#.*//; # no comments s/^\s+//; # no leading white s/\s+$//; # no trailing white next unless length; # anything left? my ($port, $proxyto) = split(/\s*=\s*/, $_, 2); $hostmap{$port} = $proxyto; } } sub usage { print <<EOT; Usage: $0 --host some.host.name [options] Options include: --logfile filename Use filename as the log file. Default is pop3proxy.log. If the filename is omitted, log to STDOUT. --nopad If nopad is specified, then message sizes will not be changed as + a result of spam scanning. The default is to add to the message s +ize. --allowtop If top is specified, then the POP3 "TOP" command will be passed +through to the server. The default is to reject client TOP commands wit +h an error message. --maxscan bytes Messages which exceed this size will not be scanned for spam. T +he default is 250000. Setting this to zero disables this behavior. --exitport port Any connection from localhost on this port will cause us to exit +. The default is 9625. Setting this to zero disables this behavio +r. EOT exit; } # Copyright (c) 2002, Dan McDonald. All Rights Reserved. # This program is free software. It may be used, redistributed # and/or modified under the terms of the Perl Artistic License # (see http://www.perl.com/perl/misc/Artistic.html)

Comment on A SpamAssassin-Enabled POP3 Proxy
Download Code
Re: A SpamAssassin-Enabled POP3 Proxy
by FoncÚ (Scribe) on Aug 27, 2002 at 21:08 UTC
    So, tell me...what kind of lag time are we looking at here? I can't run this right now to test, so I wondered if you had and checked that out.

    Just terminally curious. (-:

    FoncÚ
      Very little - the proxy seems to add fractional seconds per message (see the logs while it's running) but it might "feel" slower in your mail client due to the batched-then-dumped nature of things. The proxy buffers up the whole message, then the client gets it all at once, rather than trickling in. You might see jumpy progress bars as a result sometimes.

      One of the biggest lag problems with a SpamAssassin scan happens when SA spins off into the weeds on truly large messages. This may be an acceptable delay in a store-and-forward chain, but in a real time network proxy it's fatal. Pop3Proxy works around that (like many SA apps) by declining to scan messages over a certain size - 250K by default, but it's configurable.

      See the discussion on the manpage for more on that.

      Peace,
      -McD

Re: A SpamAssassin-Enabled POP3 Proxy
by guha (Priest) on Sep 04, 2002 at 15:48 UTC

    Great stuff McD!!

    Tried it a few days and I'm happy to say that my daily spam-quota has sunk to 3% of it's former level.

    Apart from that I must say that the manpage is very well written, in fact I think it's one of the best I've ever read. Informative, funny and well versed.

    PS. You can send the cheque to me by snail instead of email, I don't want SA to snarf it. ;-]

      guha writes:

      Tried it a few days and I'm happy to say that my daily spam-quota has sunk to 3% of it's former level.

      Thank SpamAssassin for that - the spam filtering is all about their hard work and dedication. All I do is get them the mail. :-)

      They just released 2.40 a day or two ago, and 2.41 will be close on it's heels with bugfixes it sounds like. I hope to update and qualify with that version over the coming weekend - some of the early stats on the 2.40 release's effectiveness are simply phenomenal.

      Apart from that I must say that the manpage is very well written, in fact I think it's one of the best I've ever read. Informative, funny and well versed.

      You flatter me, sir. :-)

      Peace,
      -McD

Re: A SpamAssassin-Enabled POP3 Proxy
by Anonymous Monk on Nov 01, 2002 at 12:46 UTC
    This is great - I've been struggling with getting SpamAssasin to run on WinXp and this got it all going in just a few minutes! Just one question - I've installed with SpamAssassin 2.43 and my POP3Proxy log is complaining like so: configuration file "/etc/mail/spamassassin/20_body_tests.cf" requires version 2.41 of SpamAssassin, but this is code version 2.43. Maybe you need to use the -c switch, or remove the old config files? Skipping this file at C:/Perl/site/lib/Mail/SpamAssassin/Conf.pm line 246. I does the same for files: 20_compensate.cf 20_head_tests.cf 20_meta_tests.cf 20_uri_tests.cf Is this something to worry about - or is there a fix? Thanks. BarryC.
      Barry C writes:

      This is great - I've been struggling with getting SpamAssasin to run on WinXp and this got it all going in just a few minutes!

      Glad it's working for you.

      Just one question - I've installed with SpamAssassin 2.43...

      I haven't tested against 2.43, but others have reported success.

      Based on the error message, it looks to me like you need to remove the old tests before upgrading to 2.43 - but I could be mistaken. Try zapping your entire pop3proxy directory and starting over - maybe save off your user_prefs and hostmap.txt (if any) first.

      Peace,
      -McD

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: CUFP [id://193065]
Approved by jarich
Front-paged by RMGir
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (15)
As of 2015-07-06 13:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (74 votes), past polls