The problem with huge and fast is not disk space, it's memory space. The software that performs the mailings all runs daemonized, since start-up is our biggest penalty, having 5 500M(++) daemons laying around is not funny.
Ah, I misunderstood what you meant by 'huge'. Still, if memory is your concern, that sounds like an even better reason to use a DB and let the DB handle the intersection calulations. BTW, what solution for intersection handling results in a 500MB memory footprint?! I'd like to know so I can avoid that myself.
For the purpose of blacklisting, it might be small-and-fast to convert your list of addresses into a hash instead. Assuming you've already populated @blacklist and @address, your intersection sub might look like:
my @BlackListed = intersect_of(\@blacklist, \@address);
sub intersect_of ($$) {
my $a, $b = @_;
my (%set_a, %set_b);
## put the larger set in %set_a
if (@$a > @$b) {
%set_a = map { $_ => undef } @$a;
%set_b = map { $_ => undef } @$b;
else {
%set_a = map { $_ => undef } @$b;
%set_b = map { $_ => undef } @$a;
}
my @intersect;
## iterate through smaller set
for (keys %set_b) {
push @intersect, $_ if exists $set_a{$_}
}
return @intersect;
}
This exact code is untested, but I have used code like it for whitelist/blacklist processing with address list files of about 5M each, and it performed quite acceptably. YMMV, of course.
Yoda would agree with Perl design: there is no try{}
|