Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re^2: Applying the brakes

by Ryszard (Priest)
on May 08, 2008 at 13:41 UTC ( #685468=note: print w/ replies, xml ) Need Help??


in reply to Re: Applying the brakes
in thread Applying the brakes

good info, thanks for this.


Comment on Re^2: Applying the brakes
Re^3: Applying the brakes
by tachyon-II (Chaplain) on May 08, 2008 at 14:25 UTC

    Just had another thought. While dumping the file (probably in paced chunks - waiting for the queue to shrink back towards zero) may be sensible you need to permute you infile somewhow to ensure you don't have:

    bob@domain sue@domain ... foo@domain bar@other_domain

    If you dump a whole series of emails to the same mail server in a row it will choke and possibly ban/throttle you. One simple approach would be simply to apply a sort and let the variation in username vaguely randomise the domains or you could shuffle them in an array using a Fisher Yeats.

    Provided you don't have high frequencies of gmail, hotmail, yahoo accounts a simple sort ought to work OK, otherwise you may need some clever code to make sure that these common domains don't occur in a row.

    I would probably take the easy road and try a simple sort first and check how many times a given domain occurs in your proposed concurrency frame (probably 50-100). Domains occuring more than 2-3 times within a frame may be a problem as your MTA will be asking for that many concurrent connections.

    Update

    Could not resist. Here is a don't hit the same domain if we have sent an email in the last n width frame algorithm to run you address list through. NB Code updated to remove bug where domain pulled off fifo in else unchecked against current working domain - if it is that needs to go on the fifo, if not it is good to go (untested)

      I would probably take the easy road and try a simple sort first and check how many times a given domain occurs in your proposed concurrency frame (probably 50-100). Domains occuring more than 2-3 times within a frame may be a problem as your MTA will be asking for that many concurrent connections.

      Exim can take care of this kind of things. For instance it looks for all the pending mails going to the same domain and sends then in an unique connection. Read the manual!

        Not my project! Given the benchmarks Exim would not be first choice MTA in a perfect world. You have to wonder if trying to send several 1000 emails to say gmail would be accepted. Certainly when I hooked thunderbird up to the POP side of it I got throttled to 3 email downloads per minute. Sure POP download if different to SMTP upload but Google for example seems to throttle anything that looks fast and automated across all their services, at least in my experience. Although I have no evidence to prove it I would be far from suprised if all the big free mail services did not throttle what might easily look on the surface like a spam bot.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://685468]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (7)
As of 2014-12-22 04:20 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (110 votes), past polls