http://www.perlmonks.org?node_id=1067061


in reply to Find similar records based on multiple column with multiple criteria

Can't you query MySQL "...to identify the records that are the same for me you based on several criteria..."?

Edit: If you're more inclined to use Perl than MySQL queries, perhaps the following will provide a start:

use strict; use warnings; my @tableVals; while (<DATA>) { push @tableVals, [/\s+([-.\d]+)\s+/g] if /^\|\s+\d/; } __DATA__ +----+-----+-------+--------+----+----+--------+----+--------+---+ | M | m | p | k | y | my | r | s | o | c | +----+-----+-------+--------+----+----+--------+----+--------+---+ | 84 | 250 | 16700 | 4900 | 13 | 0 | 102124 | 23 | 0 | 0 |* si +milar | 84 | 250 | 17290 | 4905 | 13 | 6 | 102124 | 1 | 3687 | 0 |* si +milar | 84 | 250 | 17290 | 4905 | 13 | 6 | 102124 | 22 | 3687 | 2 |* si +milar | 84 | 250 | 17290 | 4910 | 13 | 6 | 102124 | 3 | 3687 | 2 |* si +milar | 84 | 250 | 16700 | 4900 | 13 | 6 | 102124 | 3 | 0 | 5 |* si +milar | 84 | 250 | 17290 | 4905 | 13 | 6 | 102124 | 4 | 3687 | 2 |* si +milar | 84 | 250 | 10200 | 46423 | 11 | 5 | 52012 | 23 | 485 | 1 |# si +milar | 84 | 250 | 10900 | 46423 | 11 | 5 | 52012 | 8 | 485 | 0 |# si +milar | 84 | 250 | 9900 | 46423 | 11 | 5 | 52012 | 22 | 485 | 1 |# si +milar | 84 | 250 | 10900 | 46423 | 11 | 5 | 52012 | 3 | 485 | 1 |# si +milar | 84 | 250 | 5200 | 150000 | 07 | 11 | 31609 | 8 | 54964 | 3 |& si +milar | 84 | 250 | 5490 | 150000 | 07 | 0 | 0 | 23 | 54964 | 0 |& si +milar | 84 | 250 | 5300 | 150000 | 07 | 11 | 31609 | 6 | 54964 | 0 |& si +milar | 84 | 250 | 14390 | 49501 | 11 | 5 | 0 | 22 | 140427 | 1 |§ si +milar | 84 | 250 | 13980 | 49501 | 11 | 5 | 31751 | 6 | 140427 | 0 |§ si +milar | 84 | 250 | 13980 | 49501 | 11 | 5 | 31751 | 3 | 140427 | 1 |§ si +milar | 84 | 250 | 14380 | 49501 | 11 | 5 | 0 | 23 | 140427 | 1 |§ si +milar | 84 | 250 | 14380 | 49501 | 11 | 5 | 0 | 1 | 140427 | 0 |§ si +milar +----+-----+-------+--------+----+----+--------+----+--------+---+

The array @tableVals will contain a array of arrays (AoA). Here's a partial Dumper of it:

$VAR1 = [ [ '84', '250', '16700', '4900', '13', '0', '102124', '23', '0', '0' ], [ '84', '250', '17290', '4905', '13', '6', '102124', '1', '3687', '0' ], [ '84', '250', '17290', '4905', ...

Each 'cluster' is a row of values from your table. If you want to access the very last element that's showing above, you can do the following:

print $tableVals[2]->[3]; # (zero indexed) row 2, col 3

So the general access format is $tableVals[row]->[col], where the values for row and col are from 0 .. n.

To sum col s, you could do the following:

my $sum; $sum += $tableVals[$_]->[7] for 0 .. $#tableVals; print $sum; # prints 204

Hope this helps!

Replies are listed 'Best First'.
Re^2: Find similar records based on multiple column with multiple criteria
by VincentK (Beadle) on Dec 13, 2013 at 20:26 UTC