Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Most of the email spam I get is:

by VSarkiss (Monsignor)
on Dec 31, 2004 at 05:30 UTC ( [id://418471]=poll: print w/replies, xml ) Need Help??

Vote on this poll

Rolex knockoff
[bar] 66/13%
Nigerian 419 scam
[bar] 51/10%
Phony lottery winner
[bar] 12/2%
V1a6ra or other meds
[bar] 205/40%
Body part enlargement
[bar] 56/11%
Pr0n sites
[bar] 50/10%
CPAN bug reports
[bar] 14/3%
Undecipherable
[bar] 62/12%
516 total votes
Replies are listed 'Best First'.
Re: Most of the email spam I get is:
by jonadab (Parson) on Dec 31, 2004 at 14:14 UTC

    Let's see...

    • Character sets I can't read count as undeciperable, right? 306MB and counting. 166MB of that is GB2312 alone. (This is since August 27, 2002.) The various ks_c_ charsets between them account for another 60MB.
    • Stuff either doesn't specify what charset it's in or that's theoretically in character sets I can potentially read (mainly, UTF8, which I unfortunately can't filter because some people in the open-source community write English messages in it in preference to ASCII or Latin-1, for no discernible reason), but the subject line contains either long strings of non-alphanumeric characters, or nothing but alphanumeric characters, probably also counts as undecipherable. Another 141MB. A handful of these have long strings of punctuation in the subject, but most of them are Unicode messages written in a non-Latin writing system. 141MB since September 2003 when I wrote the rule.
    • That virus from a while back, "See the attached file for details", 235MB.
    • Assorted miscellany my filters didn't catch, 166MB (between 2004 April 23 and December 6; I start a new bin for this periodically so I can calculate the impact per-day and see how much it's increasing).
    • I did get one CPAN bug report once... for some reason I filed that under nnml:perl.* rather than under nnml:spam.*, go figure.

    The unfiltered stuff (which lands in my inbox and gets shifted manually) is what annoys me most, and I'm continually looking for ways to reduce it, without getting false positives. (My experiments with Bayesian filtering were a wash; after training ifile on my entire very large corpus of mail, I found that I had to continually go through the whole spam bin for false positives. With the system I use now, I don't go through the filtered ones, only the unfiltered ones that land in my inbox.)

    Some of the kinds of spam that land in my inbox include the following:

    • Messages with an enigmatic or vague subject line (that looks like a Markov chain or random dictionary words) and no content -- absolutely nothing in the body at all, no HTML part, no attachment, no nothing. I seem to get a fair amount of this, and I'm confused as to what possible reason the spammers could have for sending it.
    • 419s. I haven't found a solid way to detect them (without false positives) yet.
    • Phony giveaways
    • Adverts for warez
    • pornography
    • Adverts for medical products that do not, in fact, exist: ways to reverse the aging process, cures for cancer, and the like
    • Spam written in Latin characters, but in a language I don't read. Spanish predominates in this category, but I've seen German, French, and I think Italian. If I get any Portuguese, I probably mistake it for Spanish.
    • Spam written using non-Latin characters (but without specifying the charset as such, either because it's not specified at all or because it's unicode) that slips past the filter rule for non-alphanumeric subject lines by throwing in alphanumeric characters in a few spots.
    • Various prescription meds adverts that slip past my filtering rules. Most of them seem to slip past, even though I've tried to be clever with my regular expressions. I write stuff like "^Subject.*[Vv].?[Ii1l|].?[Aa@].?[Gg].?[Rr].?[Aa@]" but they still find other ways to say it and slip past. I think they use lookalike Unicode characters. Did I mention that Unicode is a plague and a nuissance? Yeah.
    • Sundry other nonsense and junk.

    However, even the stuff that gets filtered is a significant annoyance, because of the bandwidth it uses. I'm on 33.6 dialup here, so retrieving my mail takes a few minutes; when most of what I'm retrieving is unsolicited bulkmail, it's annoying to have to wait for that.

      Messages with an enigmatic or vague subject line (that looks like a Markov chain or random dictionary words) and no content -- absolutely nothing in the body at all, no HTML part, no attachment, no nothing. I seem to get a fair amount of this, and I'm confused as to what possible reason the spammers could have for sending it.

      Testing if its a valid email address? If it doesn't bounce your email address gets added to the "alive" list.

        I Disagree. They can't reliably get information on what addresses work from the transport mechanisms. The Mail Exchanger (MX) for any given domain may simply be a relay, and unable to tell the remote host if the/a recipient is invalid. If your MX is able to give that information, or is a relay that can do so by using LDAP lookups, I'd be surprised if the spambot actually cared about recording the status of that particular e-mail address (a lead, if you want to make it sound nice).

        Now, in the case that you have a relay, every message will get an OK status when the spambot delivers the message. When the message gets to a host that can say if the recipient is invalid, the relay that was connected to that host will make the "bounce" message -- I'll say "DSN" here. DSNs are sent to the envelope sender of the message. There's a very slim chance that the envelope sender of a spam message goes to some mailbox that tracks the status of leads. That would make blocking spam messages much easier for us Good Guys. Most of the time, they will use an invalid user at a valid domain. Sometimes, the user is valid. That's called a Joe Job, and the user or domain will start receiving thousands of DSNs for messages that they never sent. Not fun at all.

        I think that in this case, it's simply a mistake on the spammer's part. That sort of thing is rather common -- most often, I see messages that have a bunch of tokens that are meant to be substituted before the message goes out, but aren't. I've seen some other stupid ones before, too.

        mhoward - at - hattmoward.org
      My experiments with Bayesian filtering were a wash; after training ifile on my entire very large corpus of mail, I found that I had to continually go through the whole spam bin for false positives.

      I did the same thing when I first came to Bayesian filtering, but that's not the way to get the best results out of it. Filtering is more accurate if you simply correct its mistakes as they occur than if you preload it with an existing corpus.

      There's much more information about Bayesian filtering at Paul Graham's site.

      Markus

        Filtering is more accurate if you simply correct its mistakes as they occur

        If I have to correct false positives as the occur, this so-called "filtering" is no good to me at all, because it means I have to go through all the spam. Worse than useless. My existing filtering system is significantly better, because I am confident that 100.000% of everything filtered into the spam folders is, in fact, worthless junk. Additionally, *most* of my legitimate mail is filtered into various spam-free folders based on topic, list, sender or whatever. The only mail I have to sort by hand is the stuff that lands in my inbox (because none of my filters pick it up).

        I don't want to correct my filter's errors continually. If I have to do that, it's not doing its job at ALL; *I* would be doing 100% of the filter's job, then.

Re: Most of the email spam I get is:
by Mr. Muskrat (Canon) on Dec 31, 2004 at 16:55 UTC

    I'm seeing a rise in a variation of the Nigerian 419 scam. They say that I'm the next of kin and need to claim the money.

      Ah, but their intelligence level hasn't increased. I got one the other day addressed to "Mr Clive". I don't think they've worked out the deifferences between first and last names quite yet.

      cLive ;-)

Re: Most of the email spam I get is:
by ww (Archbishop) on Dec 31, 2004 at 18:37 UTC
    NOT read!
Re: Most of the email spam I get is:
by holli (Abbot) on Dec 31, 2004 at 09:58 UTC
    I never had a spam problem until the day i created an account at CPAN. From then on my inbox was flooded with mails, mostly containing various viruses and worms :-((
Re: Most of the email spam I get is:
by Aristotle (Chancellor) on Jan 01, 2005 at 20:51 UTC

    The poll's missing an option: offers for software from Micr0soft, Ahobe [sic] etc at “sensational prices”.

    That's what most of mine is. The rolex knockoffs would come in at #2.

    Makeshifts last the longest.

Re: Most of the email spam I get is:
by rdm (Hermit) on Dec 31, 2004 at 07:00 UTC
    This was a tough call:
    For my personal account it's about 50/50 between something in indecipherable HTML/Java/Javascript/stuff and the Rolex knockoffs.

    For the only advertised, and non-filtered, address on the system, however, it's 50/50 basic 419 variants and lottery winner scams. Except for a single advance order scam. Just one. In almost a year. And nothing else.

    Go figure.
    -Reality might not get out of Beta today. (O.Timas, "Bot")
Re: Most of the email spam I get is:
by jaldhar (Vicar) on Jan 04, 2005 at 20:08 UTC

    Not really perl-related but the best way of stopping spam is at the MTA level with RBLs (Runtime Black Lists.) These are lists of the IP addresses of confirmed spammers and other machines which should not be sending mail such as open relays, home machines, etc. Some RBLs are rather trigger-happy and add blacklist legitimate users sometimes but the following, which I use with postfix, are considered pretty reputable:

    • bl.spamcop.net
    • dnsbl.sorbs.net
    • list.dsbl.org
    • cbl.abuseat.org
    • dnsbl.njabl.org
    • sbl-xbl.spamhaus.org
    • relays.ordb.org
    • rhsbl.sorbs.net
    For the first six months I used RBLs, I kept copies of rejected mail just to make sure I wasn't bouncing legitimate mail but I didn't have one false positive. However there is a small chance this could happen. But to me that risk is worth it for the dramatic reduction in spam. For spam that manages to evade the RBLs, the combination of spamassassin (with regular updates of the rulesets), clamav, and some procmail recipes provide a secondary line of defense.

    I used to get 200-300 spam messages a day. Now thanks to these methods, I get 1 or 2.

    --
    જલધર

Re: Most of the email spam I get is:
by TStanley (Canon) on Jan 01, 2005 at 06:24 UTC
    I think I have received all of them at some point or another. The Nigerian 419's are the funniest ones in my opinion. There is a website that makes fun of these scammers and posts their pictures. I just can't remember the link to the site at the moment.

    TStanley
    --------
    The only thing necessary for the triumph of evil is for good men to do nothing -- Edmund Burke
Re: Most of the email spam I get is:
by Drgan (Beadle) on Dec 31, 2004 at 19:38 UTC

    If pr0n isn't in vast amounts for the day, then undecipherable jibberish about v|agra could be second contender. Let's see...

    I am not: Gay, Short-Packaged, Unable to get it up, or in need of that financing information I requested two days ago. Oh yeah! I'm not a woman either, so I don't need to worry about my boob-size.

    Lucky for me, I've got a spam-filter.

    "I have said, Ye are gods; and all of you are children of the most High." - Psalms 82:6
Re: Most of the email spam I get is:
by fraktalisman (Hermit) on Jan 03, 2005 at 10:39 UTC
    The poll's missing another option: fake email failure reports - they are very common at least in Germany right now.

      Indeed I was looking for this option. I get a few hundred bounces a day.

      I get about 3-5 of those "email failures" per day. They automatically get put into my Bulk Mail folder, where they then get deleted.

      TStanley
      --------
      The only thing necessary for the triumph of evil is for good men to do nothing -- Edmund Burke
Re: Most of the email spam I get is:
by TedPride (Priest) on Dec 31, 2004 at 10:19 UTC
    I use Yahoo mail now, and their filters remove about 99% of the spam. Much better than trying to do it by hand.
      Yahoo is my "public" account, I'm very happy with the filtering.

        I've used Y!Mail in the past, and been pleased overall. However, while their filters do catch about 95% of the Spam, I had problems with false positives. I, personally, would rather see a bit more pink meat in my mailbox and be assured that false positives would be very rare.

        I've been pleased with GMail in this regard, but for real "industrial" use, I use Spam Assassin with a whitelist and a threshold score of 6.5. That seems to nab about 80-85% of the Spam, and I have had 0 false positives in about 6 months.

        Anima Legato
        .oO all things connect through the motion of the mind

Re: Most of the email spam I get is:
by Popcorn Dave (Abbot) on Jan 03, 2005 at 06:42 UTC
    Mostly for enlarging body parts - not limited to "V1a6ra or other meds". Must be my age. :)

    I have received the one for the growth hormone, but at 6'4" buying clothing is challenging enough already. ;)

    Useless trivia: In the 2004 Las Vegas phone book there are approximately 28 pages of ads for massage, but almost 200 for lawyers.
Re: Most of the email spam I get is:
by tbone1 (Monsignor) on Jan 03, 2005 at 13:30 UTC
    I chose "unreadable", but for an odd reason. I use Mail.app on OS X, and its built-in junk filter is impressive. However, it seems to be less effective at processing and interpreting Japanese and Chinese, so sometimes one of those will get through. Since I read neither, most that I get is 'unreadable'.

    Plus, I'm not sure how to tell the filter "If you see something that looks like a goat next to a sparkplug". And while I don't mind learning something of other languages and cultures, this is not the conditions under which I'd like to do so.

    --
    tbone1, YAPS (Yet Another Perl Schlub)
    And remember, if he succeeds, so what.
    - Chick McGee

Re: Most of the email spam I get is:
by virtualsue (Vicar) on Jan 01, 2005 at 21:32 UTC
    ...immediately filtered to the Trash folder. Thunderbird rocks.

      So does SpamAssassin — and it's all Perl!

      Makeshifts last the longest.

Re: Most of the email spam I get is:
by hardburn (Abbot) on Jan 03, 2005 at 14:33 UTC

    I bought a domain a while back with my own server. I have a little shell script for one-use e-mail addresses, which adds an entry to /etc/aliases in the form "tmurray-(site name)", rebuilds the alias database, and adds another entry to a text file saying what website the address is associated with.

    At my mail client, I filter each message into directories. My main inbox gets all "tmurray@" messages (mostly system e-mails and a few from friends). Then I have a few directories for mailing lists. Then there is a catch-all directory (matching "tmurray-*@"), which naturally gets most of the spam (but with a lot of ligit messages, too).

    Almost all my spam comes from either one of the mailing list addresses (this mailing list didn't have address obfuscation in the archives when I first signed up) or at "tmurray-pair@" (which is on my DNS registration). It's gotten bad enough on that address that I've split off "tmurray-pair" into a seperate mail directory.

    Now here is where you get to the true vileness of spammers. That address needs to be on the DNS registration. It's standard practice for good reasons. If someone needs to tell me about a problem with my domain, that's the address they'll use. However, most of what is in there is spam. Having even one false positive would be dreadful. Spammers have polluted an otherwise critical communication path into near uselessness.

    "There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni.

Re: Most of the email spam I get is:
by blue_cowdawg (Monsignor) on Jan 03, 2005 at 19:19 UTC

    Using SpamAssassin and friends seems to have chopped the amount of SPAM I see somewhat. Still a lot of it gets through anyway. The spammers just get more and more clever sliding the stuff past the filters.

    I've even given some thought to how I could write an accounting script to calculate the amount of CPU time that is spent by my mail server processing it and billing the spammers for their share of what it costs to filter the crap out. When they refuse to pay take them to small claims court (keep the $$ amount below US$5000) and sue them. If they still don't pay take a lien out on their assets starting with any computers they use to send the stuff.

    Probably wouldn't work out in real life but hey... it's a thought....

A pain in the butt
by htoug (Deacon) on Jan 03, 2005 at 10:27 UTC
    and horribly irritating as i am forced to use LookOut at work :-{
Re: Most of the email spam I get is:
by ChuckularOne (Prior) on Jan 04, 2005 at 16:31 UTC
    Mine run 50/50 Nigerian scam/Viagra
Re: Most of the email spam I get is:
by zakzebrowski (Curate) on Jan 06, 2005 at 19:41 UTC
    In unicode or another language.
    Updated: Pine screen shot ... I actually get a few scams spams as well, as it turns out...


    ----
    Zak - the office
Re: Most of the email spam I get is:
by poqui (Deacon) on Jan 07, 2005 at 21:38 UTC
    Has anyone else had a huge jump in mortgage offers? I used to get a few, but in the last month they have sky-rocketed!
      When the Xmas bills come in people start getting desperate, and second and third mortages begin to look (relatively) better. Stupid, yes. Instead, give your friends/relatives gift certificates for free computer advice. (very stupid, yes!)
Re: Most of the email spam I get is:
by gwhite (Friar) on Jan 11, 2005 at 16:37 UTC

    I keep getting ones that claim they are from my customers and there are bugs in my programs. How silly is that?

    g_White
Re: Most of the email spam I get is:
by pfaut (Priest) on Jan 12, 2005 at 01:49 UTC

    Since I added greylisting software to my sendmail setup, I've hardly received any spam except for 419 scams.

    90% of every Perl application is already written.
    dragonchild
Re: Most of the email spam I get is:
by castaway (Parson) on Jan 11, 2005 at 18:02 UTC
    This is missing the option "filtered". I seem to get mostly viagra type spam at work, where we don't have filters.. Narry a one, they installed one once as a test, complained it needed too much intervention to sort out the false positives, and uninstalled it again, so much for that. At home the only ones that trickle through are mortgage type offers theses days (with the significant words full of spaces).

    I'm running 4 levels of filter.. a fetchmail preconnect deletes mail from certain countries (using the IP of the first received part) at the server (sorry, if you're from Korea or China, send me email to another of my addresses). Also all mail that is for other usernames on my domain, that dont actually exists as users or aliases on the system (one of these days, I'll figure out who "Castaway|castaway" is.. ). After that, its 3 levels of SpamAssassin, at 10, 5 and 2. I could probably merge the first two, I dont think Ive ever seen legit mail there since I started. Level 2 is fairly new, and catches both, so I check it often (still 95% spam though)..

    Ah the joys of email.. I've had this address 10 years, and I'm not about to stop just for some stoopid spammers..

    C.

Re: Most of the email spam I get is:
by Vynce (Friar) on Jan 11, 2005 at 23:30 UTC
    v1a6r4 is a close second to my real top, which isn't on the list -- wa|l street h0t stox but that wasn't an option. .

View List Of Past Polls


Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2024-03-19 10:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found