Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: Help with setting up spamc

by andal (Hermit)
on Jul 09, 2014 at 06:51 UTC ( [id://1092846]=note: print w/replies, xml ) Need Help??


in reply to Help with setting up spamc

I need to send $message_received to spamc and capture its output in a variable (preferably) so I can get the spam score. I know I can just back quote a system command to capture stdout to a variable, but how can I do both the stdout and the stdin handling here? This should be simple, but I am just missing it...

The somewhat low-level approach in perl would be:

my $pid = open(CHLD, "-|"); die "Failed to fork: $!\n" unless defined $pid; if($pid == 0) { die "Failed to run spamc: $!" unless open(PROC, "|spamc"); print PROC "My Arguments"; close(PROC); exit(0); } while(<CHLD>) { # collect the input } close(CHLD);

The above assumes, that spamc writes to STDOUT all output and simply exits. The approach is slow, because there are 2 forks involved. I don't know anything about SpamAssassin, but if you have spamd (daemon), then there should be some network protocol for talking to that daemon. If your program did the talking directly, then you'd save time for making forks.

Another point. Looks like SpamAssassin is slow in working even without forks. So, your best bet would be processing multiple emails in parallel.

Replies are listed 'Best First'.
Re^2: Help with setting up spamc
by SteveTheTechie (Novice) on Jul 10, 2014 at 00:15 UTC

    Ok, this looks very interesting. Have not fiddled with tee and forks for a while.

    I did not think of communicating with spamd directly--thought I had to use the spamc cmd line interface.

    Thanks!

      As I said, I don't know much about SpamAssassin. But quick search brought up Mail::SpamAssassin::Client which implements protocol for talking to spamd.

      Note, the page for Mail::SpamAssassin says

      If you wish to use a command-line filter tool, try the spamassassin or the spamd/spamc tools provided
      So, I would believe, that these tools are good only when you have to use external commands, for example when you program not in perl, but in shell.

      In general, to increase throughput, you should make processing of each message independent as much as possible, so that one message does not have to wait for another. That usually means, that each message handler should run either in separate process, or in separate thread. Using separate spamd is of help only because it internally uses multiple processes/threads to handle messages. But if you feed your messages one by one, then the benefit is lost. And opposite, if you handle your messages in separate processes/threads, it does not make sense to move spamassassin into separate process, because you just add extra overhead of communicating with that process.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1092846]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (6)
As of 2024-04-18 03:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found