Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Matching values between two arrays of hashes.

by tacoking92 (Novice)
on Oct 31, 2012 at 18:08 UTC ( #1001725=perlquestion: print w/ replies, xml ) Need Help??
tacoking92 has asked for the wisdom of the Perl Monks concerning the following question:

I have two array of hashes that have a key value that is the same between them. They key value is 'location_id'. I want to match these arrays of hashes against the key value and return the 'id' value of one of the hashes if it matches. I currently have working code, but I do not think it is correct or the fastest way to accomplish this. Example:

@subnets = { { 'id' => '1', 'location_id' => '30', 'subnet' => '255.255.255.0' }, { 'id' => '2', 'location_id' => '13', 'subnet' => '255.255.254.0' }, { 'id' => '3', 'location_id' => '19', 'subnet' => '255.255.0.0' }, }; @filers_info = { { 'id' => '1', 'location_id => '19', 'info' => 'blah1', }, { 'id' => '2', 'location_id => '30', 'info' => 'blah1', }, }; foreach my $subnet (@subnets) { my @found_filers = grep { $_->{'location_id'} == $subnet->{'locatio +n_id'} } @filers_info; push @total_filers, @found_filers; } @filer_ids = map{$_->{'id'} } @total_filers;

This does work as expected, but I don't think it's fast enough. In the real world @subnets contains over 100,000 records and @filers_info has around 2000. This looping through the @subnets array takes quite a while. I'm looking for a method to possibly speed this code up without using any external modules. Thanks!

Comment on Matching values between two arrays of hashes.
Download Code
Re: Matching values between two arrays of hashes.
by space_monk (Chaplain) on Oct 31, 2012 at 18:54 UTC

    Is location_id unique in either the subnets array or filers_info array? If so, why not change the definition of one or both arrays into hashes with the location_id as the key?

    I was also going to suggest changing your code to make the outer loop the filers and the inner loop the subnets, but I've forgotten which way round the rules on nested loop performance operate and its time I went home.... :-)

Re: Matching values between two arrays of hashes.
by Kenosis (Priest) on Oct 31, 2012 at 19:11 UTC

    I second space_monk's suggestion of using hashes keyed on the location_id. The following uses two hashes of arrays of hashes to handle multiple 'records' having the same location_id:

    use strict; use warnings; use Benchmark qw(cmpthese); my %subnets; my %filers_info; my @subnets = ( { 'id' => '1', 'location_id' => '30', 'subnet' => '255.255.255.0' }, { 'id' => '2', 'location_id' => '13', 'subnet' => '255.255.254.0' }, { 'id' => '3', 'location_id' => '19', 'subnet' => '255.255.0.0' }, ); my @filers_info = ( { 'id' => '1', 'location_id' => '19', 'info' => 'blah1', }, { 'id' => '2', 'location_id' => '30', 'info' => 'blah1', }, { 'id' => '3', 'location_id' => '30', 'info' => 'blah1', }, ); push @{ $subnets{ $_->{'location_id'} } }, $_ for @subnets; push @{ $filers_info{ $_->{'location_id'} } }, $_ for @filers_info; sub original { my @total_filers; foreach my $subnet (@subnets) { my @found_filers = grep { $_->{'location_id'} == $subnet->{'location_id'} } @fi +lers_info; push @total_filers, @found_filers; } } sub hashed { my @total_filers; exists $subnets{$_} and push @total_filers, @{ $filers_info{$_} } for keys %filers_info; } cmpthese( -5, { original => sub { original() }, hashed => sub { hashed() }, } );

    Dumper output of @total_filers from sub hashed { ...:

    $VAR1 = [ { 'info' => 'blah1', 'location_id' => '30', 'id' => '2' }, { 'info' => 'blah1', 'location_id' => '30', 'id' => '3' }, { 'info' => 'blah1', 'location_id' => '19', 'id' => '1' } ];

    Benchmark results:

    Rate original hashed original 282623/s -- -64% hashed 780346/s 176% --

    Hope this helps!

      Bingo. That's exactly what I was looking for! Thank you very much for the help. I'm not sure why I didn't think about using the location_id as a hashed key. Makes perfect sense. Thanks everyone.

        Am glad it worked for you!

Re: Matching values between two arrays of hashes.
by AnomalousMonk (Monsignor) on Oct 31, 2012 at 19:15 UTC

    I'm not entirely sure I understand your requirement, but if I do, this might fill the bill:

    >perl -wMstrict -MData::Dump -le "my @subnets = ( { qw(id 1 location_id 30 subnet 255.255.255.0) }, { qw(id 2 location_id 13 subnet 255.255.254.0) }, { qw(id 3 location_id 19 subnet 255.255.0.0 ) }, ); dd \@subnets; ;; my @filers_info = ( { qw(id 1 location_id 19 info blah1) }, { qw(id 2 location_id 30 info blah1) }, { qw(id 3 location_id 99 info yada1) }, ); dd \@filers_info; ;; my %locations = map { $_->{location_id}, 1 } @subnets; dd \%locations; ;; my @filer_ids = grep $locations{ $_->{location_id} }, @filers_info; dd \@filer_ids; " [ { id => 1, location_id => 30, subnet => "255.255.255.0" }, { id => 2, location_id => 13, subnet => "255.255.254.0" }, { id => 3, location_id => 19, subnet => "255.255.0.0" }, ] [ { id => 1, info => "blah1", location_id => 19 }, { id => 2, info => "blah1", location_id => 30 }, { id => 3, info => "yada1", location_id => 99 }, ] { 13 => 1, 19 => 1, 30 => 1 } [ { id => 1, info => "blah1", location_id => 19 }, { id => 2, info => "blah1", location_id => 30 }, ]
Re: Matching values between two arrays of hashes.
by NetWallah (Abbot) on Oct 31, 2012 at 19:16 UTC
    Is "location_id" numeric ?
    If so, you can make lookups considerably faster by using an array:
    my @filters_locations; $filters_locations[$_]=1 for map {$_->{location_id}} @filters_info;
    If it is not numeric, make %filters_locations a hash.

    Later, you can lookup the selected location by indexing into @filter_locations (or the hash).

                 "By three methods we may learn wisdom: First, by reflection, which is noblest; Second, by imitation, which is easiest; and third by experience, which is the bitterest."           -Confucius

Re: Matching values between two arrays of hashes.
by CountZero (Bishop) on Oct 31, 2012 at 20:48 UTC
    Dump that data into a database and use 'SELECT * FROM subnets, filersinfo WHERE subnets.location_id = filersinfo.location_id' to get all the records where location_id matches in both tables.

    I tried it with an SQLite database and it got the info in less than a second.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1001725]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (4)
As of 2014-07-26 10:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (175 votes), past polls