Comparing elements in Array of Hashes (AoH)

hmadhi has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Comparing elements in Array of Hashes (AoH) by moritz (Cardinal) on Jul 18, 2012 at 14:14 UTC
If the inner hashes always just have those 3 keys, there's no need for an inner loop: `my $same = 0; if (@AoH_curr == @AoH_prev) { $same = 1; for my $i (0..$#AoH_curr) } my $a = $AoH_curr[$i]; my $b = $AoH_prev[$i]; if ($a->{node} ne $b->{node} \|\| $a->{link} ne $b->{link} \|\| $a->{link} ne $b->{link}) { $same = 0; last; } } }` [download] (Untested). Perl 6 - the future is here, just unevenly distributed	[reply] [d/l]
Re: Comparing elements in Array of Hashes (AoH) by BrowserUk (Patriarch) on Jul 18, 2012 at 14:44 UTC
Try it this way: `my %lookup = map{ join( $;, $_->{node}, $_->{link} ) => $_->{load} } @ +AoHCurr; exists $lookup{ join $;, $_->{node}, $_->{link} } and print "node:$_->{node} link:$_->{link} load: $_->{load} :", $lookup{ join $;, $_->{node}, $_->{link} } for @AoHPrev;` [download] That should work out to be roughly 2000 times faster than nested loops for a 10k X 10k cross-sompare. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. The start of some sanity?	[reply] [d/l]
Re: Comparing elements in Array of Hashes (AoH) by sundialsvc4 (Abbot) on Jul 18, 2012 at 15:09 UTC
The first question in my mind is... of the 10,000 rows, how many duplicates actually exist? And let’s start just with nodes: how many nodes, regardless of link, actually exist in both lists? Now, how about links, regardless of node. In both cases, which set of “hits” is smaller ... in the actual, observed data? The device of choice is a hash. Loop through the first list and, for each key, set `$$hash{$key1} = 0`. &nbs; Now, having processed the first list entirely, on to the second list... `if (exists($$hash{$key2}) { $$hash{$key2} = 1; }`. When that second loop is finished, every hash-table entry having a value of `1` represents entries which occur in both lists. Notice that you can, in just these two loops, simultaneously find present-in-both values for both `node` and, separately, `link`. The next step, then, is to loop through both lists once final time. This time, you consider only `node` and `link` values that are known to be present in both lists. You construct a hash-key that consists of a concatenation of both values, e.g. `"$node::$link"`. The keys that you put into this final hash-table are those which are known to be highly likely to be a double-match. This strategy would achieve the answer having made three full-length passes through both lists. Memory can be reclaimed in each case by deleting hash-entries that do not contain the value `1` after the conclusion of the second loop of each pair. The algorithm is suggested on the presumption that duplicate keys are a usefully-small percentage of the total. As a footnote: what you are doing here is called a merge as well as an inner join and this approach is based on the plentiful-memory assumption that omits both of two possible alternatives which would require minimal if any programming: Putting the data into (say...) an SQLite database file, then executing an `INNER JOIN` query. Sorting two files identically using the `sort` command-line command and then merging the resulting files with the `merge` command-line command (of a Unix or Linux system). If you have to do a lot of stuff like this, or you want to do a bunch of ad-hoc work with this data, SQLite (and maybe Perl, of course) might be just the ticket. Quite an amazing piece of software... (And, so is Perl!)
Re: Comparing elements in Array of Hashes (AoH) by tobyink (Canon) on Jul 18, 2012 at 14:15 UTC
Is it correct to infer that `$i` and `$j` are fairly meaningless, and that the data in `$AoH_curr[42]` and `$AoH_prev[42]` are not necessarily related? `perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'`	[reply] [d/l] [select]
Re^2: Comparing elements in Array of Hashes (AoH) by hmadhi (Acolyte) on Jul 18, 2012 at 14:23 UTC
correct	[reply]
Re^3: Comparing elements in Array of Hashes (AoH) by tobyink (Canon) on Jul 18, 2012 at 14:29 UTC
OK then, the outer arrays are essentially the wrong structure for doing a comparison - you'd be better off with hashes of hashes. But perhaps the outer arrays make sense in other parts of your code, so let's stick with them and just convert to hash temporarily while doing the comparison. The following example copies both sets of data into a temporary hash to make the comparisons a bit easier. It should run in roughly O(n) time. In other words, the amount of time it takes to run should increase linearly (rather than quadratically or exponentially) with the number of items being processed. Of course, building the temporary hash structure uses up extra memory. You're trading memory for CPU time. # Some example data... my @curr = ( { node => 'Alice', link => 'A to B', load => 20, }, { node => 'Alice', link => 'A to C', load => 24, }, { node => 'Bob', link => 'B to C', load => 12, }, { node => 'Bob', link => 'B to A', load => 3, }, { node => 'Carol', link => 'C to A', load => 14, }, { node => 'Carol', link => 'C to B', load => 7, }, ); my @prev = ( { node => 'Alice', link => 'A to B', load => 20, }, { node => 'Alice', link => 'A to C', load => 24, }, { node => 'Bob', link => 'B to C', load => 15, }, { node => 'Bob', link => 'B to A', load => 3, }, { node => 'Carol', link => 'C to A', load => 12, }, { node => 'Carol', link => 'C to B', load => 3, }, ); # Transform to a hash. This happens in roughly O(n) time. my %hash; $hash{"$_->{node}\|$_->{link}"} = +{%$_} for @curr; $hash{"$_->{node}\|$_->{link}"}{prev_load} = $_->{load} for @prev; # Again, looping through the hash values is roughly O(n). for (values %hash) { printf "Node '%s', link '%s': previously %d, currently %d.\n", $_->{node}, $_->{link}, $_->{prev_load}, $_->{load}, unless $_->{load} == $_->{prev_load}; } [download] `perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'`	[reply] [d/l]
Re: Comparing elements in Array of Hashes (AoH) by dolmen (Beadle) on Jul 18, 2012 at 14:11 UTC
Show us your code and we will tell you what's wrong with it.	[reply]
Re^2: Comparing elements in Array of Hashes (AoH) by hmadhi (Acolyte) on Jul 18, 2012 at 14:18 UTC
I have converted to references `for ( my $i = 0 ; $i <= $#{$aref_current_events} ; $i++ ) { for (my $j = 0 ; $j <= $#{$aref_previous_events} ; $j++ ) { if (( $aref_current_events->[$i]{'node'} eq "$aref_previous_events->[$j]{'node'}" ) && ($aref_current_events->[$i]{'link'} eq "$aref_previous_events->[$j]{'link'}" ) ) { print "$aref_current_events->[$i]{'load'} $aref_current_events->[$ +i]{'load'}\n"; } } }` [download]	[reply] [d/l]
Re: Comparing elements in Array of Hashes (AoH) by gri6507 (Deacon) on Jul 18, 2012 at 14:16 UTC
I am guessing that you are comparing each hash element across both AoH, one element at a time. This can be slow because it requires accessing each element and comparing each element. Instead of doing this, you could create some new indication for each array element (i.e. `join('', %{$AoH[$i]})` and then compare those indicators.	[reply] [d/l]


P is for Practical
	PerlMonks