http://www.perlmonks.org?node_id=1184305


in reply to How to extract the best match from matrix

Hi jnarayan81,

As pme and choroba mentioned, you will get more/better help if you describe the objective of your nearest function.

I did not closely inspect your code, but I took a guess at what you were trying to do and came up with the following example.

#!/usr/bin/env perl use strict; use warnings; use Data::Table; # Create a Data::Table with headers (assuming data is tab-delimited) my $dt = Data::Table::fromTSV( 'mat.txt', 1 ); # Get the number of rows in the Data::Table my $n_rows = $dt->nofRow; my $query = [ 37,35,59,70 ]; my $nearest_name = ''; my $min_dist; foreach my $i (0..$n_rows - 1){ my $row_ref = $dt->rowRef($i); # Get row of Data::Table as an ARRA +Y REF my $name = shift @{$row_ref}; # The name is in the first column my $dist = dist($query, $row_ref); $min_dist = !defined($min_dist) ? $dist : $dist < $min_dist ? $dist : $min_dist; $nearest_name = $dist <= $min_dist ? $name : $nearest_name; } print "The nearest to: "; print join(", ", @{$query}); print " is: $nearest_name\n"; exit; # Calculate the Euclidean distance between two vectors sub dist { my ($x, $y) = @_; unless(ref($x) eq 'ARRAY' and ref($y) eq 'ARRAY'){ die "Vectors must be given as array references" } unless (scalar @{$x} == scalar @{$y}) { die "Vectors are not of equal length"; } my $sum_sq = 0; my $len = scalar @{$x}; foreach my $i (0..$len - 1) { $sum_sq += ($x->[$i] - $y->[$i])**2; } return sqrt($sum_sq); }

I like the Data::Table module for manipulating tabular data, but there are many other ways to load/manipulate your data. If your definition of distance or best match is not euclidean distance then modify the dist subroutine accordingly.

UPDATE: Originally the dist sub returned $sum_sq. I changed it to return the correct euclidean distance which is sqrt($sum_sq).