in reply to Finding e-mail headers

If you have slurp your headers into a scalar, it's easy to chop it up into an array with multiline headers correctly stitched up together:

my $header = <<'HEADER'; Return-Path: <NITAIGOURANGA@AOL.COM> X-Original-To: grinder@example.com Delivered-To: grinder@example.com Received: from RX504Second (ACBC197E.ipt.aol.com [172.188.25.126]) by example.com (Postfix) with SMTP id C5960A94C for <grinder@example.com>; Mon, 30 Jun 2003 13:18:36 +0200 (CE +ST) From: "GOURANGA" <NITAIGOURANGA@AOL.COM> HEADER my @header = split /\n(?!\s+)/, $header;

You might want to post-process each element to fold whitespace as well. A module will probably do a better job, but I find tr/\n\t / /s is usually good enough for my needs.

Once you have your headers, I would suggest looking at the Return-Path: header, which holds the envelope sender of the message. I would also suggest you take a look at the Received: headers as well. These two items are very revealing when it comes to dealing with spew. The To: and From: headers are nearly always forged, or at least irrelevant, in spammers' messages.

_____________________________________________
Come to YAPC::Europe 2003 in Paris, 23-25 July 2003.

Replies are listed 'Best First'.
Re: Re: Finding e-mail headers
by AssFace (Pilgrim) on Jun 30, 2003 at 18:43 UTC
    I'll have a try at that.

    In terms of the spam, I have SpamAssassin already doing that - the stats I'm interested are only in our own users - seeing who is getting the most ham/spam.
    I don't particularly care who the spam is from for the exact reason that you mention.

    -------------------------------------------------------------------
    There are some odd things afoot now, in the Villa Straylight.
      It may be easier to parse the entries from /var/log/maillog or /var/adm/messages (depending what system you are running sa on) and build a hash that has the message ID/to/from/spam rating and normalize from there.

      -Waswas
        sorry about this double post - my net connection is acting up and I must have reloaded the submit page - apologies

        -------------------------------------------------------------------
        There are some odd things afoot now, in the Villa Straylight.
        This is SpamAssassin on Win32 that I hacked into Exchange via an EventSink. So all of this is on Windows 2000.

        -------------------------------------------------------------------
        There are some odd things afoot now, in the Villa Straylight.