note
kevbot
Hi [jnarayan81],
<p>As [pme] and [choroba] mentioned, you will get more/better help if you describe the objective of your <code>nearest</code> function.</p>
<p>I did not closely inspect your code, but I took a guess at what you were trying to do and came up with the following example.</p>
<code>
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Table;
# Create a Data::Table with headers (assuming data is tab-delimited)
my $dt = Data::Table::fromTSV( 'mat.txt', 1 );
# Get the number of rows in the Data::Table
my $n_rows = $dt->nofRow;
my $query = [ 37,35,59,70 ];
my $nearest_name = '';
my $min_dist;
foreach my $i (0..$n_rows - 1){
my $row_ref = $dt->rowRef($i); # Get row of Data::Table as an ARRAY REF
my $name = shift @{$row_ref}; # The name is in the first column
my $dist = dist($query, $row_ref);
$min_dist = !defined($min_dist) ? $dist
: $dist < $min_dist ? $dist
: $min_dist;
$nearest_name = $dist <= $min_dist ? $name : $nearest_name;
}
print "The nearest to: ";
print join(", ", @{$query});
print " is: $nearest_name\n";
exit;
# Calculate the Euclidean distance between two vectors
sub dist {
my ($x, $y) = @_;
unless(ref($x) eq 'ARRAY' and ref($y) eq 'ARRAY'){
die "Vectors must be given as array references"
}
unless (scalar @{$x} == scalar @{$y}) {
die "Vectors are not of equal length";
}
my $sum_sq = 0;
my $len = scalar @{$x};
foreach my $i (0..$len - 1) {
$sum_sq += ($x->[$i] - $y->[$i])**2;
}
return sqrt($sum_sq);
}
</code>
<p>I like the [metamod://Data::Table] module for manipulating tabular data, but there are many other ways to load/manipulate your data. If your definition of <i>distance</i> or <i>best match</i> is not euclidean distance then modify the <code>dist</code> subroutine accordingly.</p>
<p><b>UPDATE:</b> Originally the <code>dist</code> sub returned <code>$sum_sq</code>. I changed it to return the correct euclidean distance which is <code>sqrt($sum_sq)</code>.</p>
1184298
1184298