Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re^2: Finding an intersection of two sets, lean and mean

by Sinister (Friar)
on Jun 30, 2005 at 07:51 UTC ( #471246=note: print w/ replies, xml ) Need Help??


in reply to Re: Finding an intersection of two sets, lean and mean
in thread Finding an intersection of two sets, lean and mean

I have taken this in to serious consideration, however the impact of this does not justify it.

The problem with huge and fast is not disk space, it's memory space. The software that performs the mailings all runs daemonized, since start-up is our biggest penalty, having 5 500M(++) daemons laying around is not funny.

I need to preserve RAM space (eg: pref. no more then the size of the file) and be fast with my blacklisting, since operators are getting more and more demanding.

Dragonchild's solution was my first attempt as well, it consumes less memory but it is slow(er) so I came up with the above piece of code... TIMTOWTDI, remember?


Comment on Re^2: Finding an intersection of two sets, lean and mean
Re^3: Finding an intersection of two sets, lean and mean
by radiantmatrix (Parson) on Jun 30, 2005 at 16:26 UTC

    The problem with huge and fast is not disk space, it's memory space. The software that performs the mailings all runs daemonized, since start-up is our biggest penalty, having 5 500M(++) daemons laying around is not funny.

    Ah, I misunderstood what you meant by 'huge'. Still, if memory is your concern, that sounds like an even better reason to use a DB and let the DB handle the intersection calulations. BTW, what solution for intersection handling results in a 500MB memory footprint?! I'd like to know so I can avoid that myself.

    For the purpose of blacklisting, it might be small-and-fast to convert your list of addresses into a hash instead. Assuming you've already populated @blacklist and @address, your intersection sub might look like:

    my @BlackListed = intersect_of(\@blacklist, \@address); sub intersect_of ($$) { my $a, $b = @_; my (%set_a, %set_b); ## put the larger set in %set_a if (@$a > @$b) { %set_a = map { $_ => undef } @$a; %set_b = map { $_ => undef } @$b; else { %set_a = map { $_ => undef } @$b; %set_b = map { $_ => undef } @$a; } my @intersect; ## iterate through smaller set for (keys %set_b) { push @intersect, $_ if exists $set_a{$_} } return @intersect; }
    This exact code is untested, but I have used code like it for whitelist/blacklist processing with address list files of about 5M each, and it performed quite acceptably. YMMV, of course.

    Yoda would agree with Perl design: there is no try{}

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://471246]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (7)
As of 2014-12-25 10:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (160 votes), past polls