Comparing elements in Array of Hashes (AoH)

 on Jul 18, 2012 at 14:04 UTC Need Help??
hmadhi has asked for the wisdom of the Perl Monks concerning the following question:

I have two AoH @AoH_curr and @AoH_prev. There are about 10,000 rows in each array

eg
```@AoH_curr[\$i]{'node'}

@AoH_prev[\$j]{'node'}

I would like to compare the node and link of @AoH_curr and @AoH_prev and if they are equal print out the load value for current and previous.

I need a way to do this. I have tried nested loops but it is taking way to long. Any ideas on a fast algorithm.

Replies are listed 'Best First'.
Re: Comparing elements in Array of Hashes (AoH)
by moritz (Cardinal) on Jul 18, 2012 at 14:14 UTC

If the inner hashes always just have those 3 keys, there's no need for an inner loop:

```my \$same = 0;
if (@AoH_curr == @AoH_prev) {
\$same = 1;
for my \$i (0..\$#AoH_curr) }
my \$a = \$AoH_curr[\$i];
my \$b = \$AoH_prev[\$i];
if (\$a->{node} ne \$b->{node}
\$same = 0;
last;
}
}
}

(Untested).

Re: Comparing elements in Array of Hashes (AoH)
by BrowserUk (Pope) on Jul 18, 2012 at 14:44 UTC

Try it this way:

```my %lookup = map{ join( \$;, \$_->{node}, \$_->{link} ) => \$_->{load} } @
+AoHCurr;

exists \$lookup{ join \$;, \$_->{node}, \$_->{link} }
\$lookup{ join \$;, \$_->{node}, \$_->{link} }
for @AoHPrev;

That should work out to be roughly 2000 times faster than nested loops for a 10k X 10k cross-sompare.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?

Re: Comparing elements in Array of Hashes (AoH)
by sundialsvc4 (Abbot) on Jul 18, 2012 at 15:09 UTC

The first question in my mind is... of the 10,000 rows, how many duplicates actually exist?   And let’s start just with nodes:   how many nodes, regardless of link, actually exist in both lists?   Now, how about links, regardless of node.   In both cases, which set of “hits” is smaller ... in the actual, observed data?

The device of choice is a hash.   Loop through the first list and, for each key, set \$\$hash{\$key1} = 0. &nbs; Now, having processed the first list entirely, on to the second list... if (exists(\$\$hash{\$key2}) { \$\$hash{\$key2} = 1; }.   When that second loop is finished, every hash-table entry having a value of 1 represents entries which occur in both lists.   Notice that you can, in just these two loops, simultaneously find present-in-both values for both node and, separately, link.

The next step, then, is to loop through both lists once final time.   This time, you consider only node and link values that are known to be present in both lists.   You construct a hash-key that consists of a concatenation of both values, e.g. "\$node::\$link".   The keys that you put into this final hash-table are those which are known to be highly likely to be a double-match.

This strategy would achieve the answer having made three full-length passes through both lists.   Memory can be reclaimed in each case by deleting hash-entries that do not contain the value 1 after the conclusion of the second loop of each pair.

The algorithm is suggested on the presumption that duplicate keys are a usefully-small percentage of the total.

As a footnote: what you are doing here is called a merge as well as an inner join and this approach is based on the plentiful-memory assumption that omits both of two possible alternatives which would require minimal if any programming:

1. Putting the data into (say...) an SQLite database file, then executing an INNER JOIN query.
2. Sorting two files identically using the sort command-line command and then merging the resulting files with the merge command-line command (of a Unix or Linux system).

If you have to do a lot of stuff like this, or you want to do a bunch of ad-hoc work with this data, SQLite (and maybe Perl, of course) might be just the ticket.   Quite an amazing piece of software...   (And, so is Perl!)

Re: Comparing elements in Array of Hashes (AoH)
by dolmen (Beadle) on Jul 18, 2012 at 14:11 UTC
Show us your code and we will tell you what's wrong with it.
I have converted to references
```
for ( my \$i = 0 ; \$i <= \$#{\$aref_current_events} ; \$i++ )
{
for (my \$j = 0 ; \$j <= \$#{\$aref_previous_events} ; \$j++ )
{
if (( \$aref_current_events->[\$i]{'node'} eq
"\$aref_previous_events->[\$j]{'node'}" )
)
{
}
}
}
Re: Comparing elements in Array of Hashes (AoH)
by tobyink (Abbot) on Jul 18, 2012 at 14:15 UTC

Is it correct to infer that \$i and \$j are fairly meaningless, and that the data in \$AoH_curr[42] and \$AoH_prev[42] are not necessarily related?

perl -E'sub Monkey::do{say\$_,for@_,do{(\$monkey=[caller(0)]->[3])=~s{::}{ }and\$monkey}}"Monkey say"->Monkey::do'
correct

OK then, the outer arrays are essentially the wrong structure for doing a comparison - you'd be better off with hashes of hashes. But perhaps the outer arrays make sense in other parts of your code, so let's stick with them and just convert to hash temporarily while doing the comparison.

The following example copies both sets of data into a temporary hash to make the comparisons a bit easier. It should run in roughly O(n) time. In other words, the amount of time it takes to run should increase linearly (rather than quadratically or exponentially) with the number of items being processed. Of course, building the temporary hash structure uses up extra memory. You're trading memory for CPU time.

```# Some example data...
my @curr = (
{
node => 'Alice',
},
{
node => 'Alice',
},
{
node => 'Bob',
},
{
node => 'Bob',
},
{
node => 'Carol',
},
{
node => 'Carol',
},
);

my @prev = (
{
node => 'Alice',
},
{
node => 'Alice',
},
{
node => 'Bob',
},
{
node => 'Bob',
},
{
node => 'Carol',
},
{
node => 'Carol',
},
);

# Transform to a hash. This happens in roughly O(n) time.
my %hash;

# Again, looping through the hash values is roughly O(n).
for (values %hash)
{
printf
"Node '%s', link '%s': previously %d, currently %d.\n",
\$_->{node},
}
perl -E'sub Monkey::do{say\$_,for@_,do{(\$monkey=[caller(0)]->[3])=~s{::}{ }and\$monkey}}"Monkey say"->Monkey::do'
Re: Comparing elements in Array of Hashes (AoH)
by gri6507 (Deacon) on Jul 18, 2012 at 14:16 UTC
I am guessing that you are comparing each hash element across both AoH, one element at a time. This can be slow because it requires accessing each element and comparing each element. Instead of doing this, you could create some new indication for each array element (i.e. join('', %{\$AoH[\$i]}) and then compare those indicators.

Create A New User
Node Status?
node history
Node Type: perlquestion [id://982434]
Front-paged by Corion
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (11)
As of 2018-03-21 15:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
When I think of a mole I think of:

Results (268 votes). Check out past polls.

Notices?