http://www.perlmonks.org?node_id=1209896

in reply to Re^3: creating and managing many hashes
in thread creating and managing many hashes

```Products : 2008
Dates    : 530
Records  : 867434
Run Time : 57 s

Thanks, poj. I'm new to hashes. The snippet of code you've provided, calculates the mean and std dev, but not the pair correlation. Am I correct in assuming that this bit will need to be built in and consequently the run times would look very different to what it does currently? Also, my data set unfortunately does not prices and inventories for all products on all days. I'd appreciate any advice you can provide. Thank you once again.

Replies are listed 'Best First'.
Re^5: creating and managing many hashes
by poj (Abbot) on Feb 24, 2018 at 13:40 UTC

You would need to detail the correlation algorithm you want to use. The simple one I tried was this where \$v would be either 'price' or 'qu'. This assumes for any date you have the full data, so you need to decide how to handle the dates where you don't have 2 prices or qu to correlate.

```sub correlate {
my (\$p1,\$p2,\$v) = @_;
return if \$p1 eq \$p2;
my (\$xy,\$x2,\$y2);
for my \$date (sort keys %data){
my \$x = \$data{\$date}{\$p1}{\$v} - \$total{\$p1}{\$v}{'mean'} ;
my \$y = \$data{\$date}{\$p2}{\$v} - \$total{\$p2}{\$v}{'mean'} ;
\$xy += \$x * \$y;
\$x2 += \$x * \$x;
\$y2 += \$y * \$y;
}
my \$cor = \$xy / sqrt(\$x2 * \$y2);
print "\$p1 \$p2 \$v \$cor\n";
}