Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re: Applying the brakes

by tachyon-II (Chaplain)
on May 08, 2008 at 13:19 UTC ( [id://685463]=note: print w/replies, xml ) Need Help??


in reply to Applying the brakes

You may find this comparison of MTAs interesting. Yes it is dated but Exim was slow as a dog back then. Here is another comparison which is a bit less dated and still shows Exim at the back of the field.

Now the hardware used was relatively old and slow but this is not a hardware bottleneck task. It is concurrency, network connectivity, and DNS lookups that are bottlenecks. To send an email your server has to look up the address in DNS (you want a big local cache), then make the connection to the remote server (may take seconds), then send HELO,MAIL,RCPT,DATA,EOM - all of which can be slowed down by the remote server. The entire transaction can easily exceed several seconds, thus you require multiple concurrent processes (50+) in your MTA to get any sort of decent throughput.

You will note that performace in terms of emails per second is a feeble 5-15 depending on MTA. At 20/second througput you are looking at 20 hours runtime for 1.5 million emails.

I suggest you look at a dedicated mail server that also runs its own DNS with a huge cache to do this task. Exim looks like one of the less efficient options judging from the performance benchmarks. If you dump 1.5 million emails onto your Exim queue and it is really only running a throughput of 5/sec you will effectively freeze outgoing email for 80 hours as any new message will go to the back of the queue. Even if it is doing 50/sec there will be nothing else going out for 8 hours - unless of course there is some way to flag it as low priority. Not only will you have an outgoing email problem you will also have an incoming email issue as Exim is also recieving incoming connections.

If you did want to split up the flatfile of email addresses you don't need a database. Just use split(1).

Replies are listed 'Best First'.
Re^2: Applying the brakes
by Ryszard (Priest) on May 08, 2008 at 13:41 UTC
    good info, thanks for this.

      Just had another thought. While dumping the file (probably in paced chunks - waiting for the queue to shrink back towards zero) may be sensible you need to permute you infile somewhow to ensure you don't have:

      bob@domain sue@domain ... foo@domain bar@other_domain

      If you dump a whole series of emails to the same mail server in a row it will choke and possibly ban/throttle you. One simple approach would be simply to apply a sort and let the variation in username vaguely randomise the domains or you could shuffle them in an array using a Fisher Yeats.

      Provided you don't have high frequencies of gmail, hotmail, yahoo accounts a simple sort ought to work OK, otherwise you may need some clever code to make sure that these common domains don't occur in a row.

      I would probably take the easy road and try a simple sort first and check how many times a given domain occurs in your proposed concurrency frame (probably 50-100). Domains occuring more than 2-3 times within a frame may be a problem as your MTA will be asking for that many concurrent connections.

      Update

      Could not resist. Here is a don't hit the same domain if we have sent an email in the last n width frame algorithm to run you address list through. NB Code updated to remove bug where domain pulled off fifo in else unchecked against current working domain - if it is that needs to go on the fifo, if not it is good to go (untested)

        I would probably take the easy road and try a simple sort first and check how many times a given domain occurs in your proposed concurrency frame (probably 50-100). Domains occuring more than 2-3 times within a frame may be a problem as your MTA will be asking for that many concurrent connections.

        Exim can take care of this kind of things. For instance it looks for all the pending mails going to the same domain and sends then in an unique connection. Read the manual!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://685463]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (5)
As of 2024-03-28 23:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found