Problems? Is your data what you think it is? PerlMonks

### Re: Data visualisation.

by Laurent_R (Abbot)
 on Jan 04, 2014 at 00:48 UTC ( #1069219=note: print w/replies, xml ) Need Help??

in reply to Data visualisation.

Alright, an update on the strategy I proposed earlier. The calculation is quite simple, no need for trigonometry, simple application of the Pythagorean theorem. Real code is less than 20 lines.

First a careful analysis of the problem:

```Let the coordinates of A be (0, 0) and B (0, 633). C is 257 someunits
+(let's say meters, we don't care) from A and 390 m from B.

Distance A-C:  sqrt (x˛ + y˛) = 257          <=> x˛ + y˛ = 257˛
+    (1)
Distance B-C:  sqrt (x˛ + (633 - y)˛) = 390  <=> x˛ + (633 - y)˛ = 390
+˛   (2)
Substrating 2 from 1:  y˛ - (633 - y)˛ = 257˛ - 390˛
<=> (y + a - y)(y - a + y) = 257˛ - 390˛
<=> a (2 y - a) = 257˛ - 390˛
<=> y = (257˛ - 390˛) / 2a + a/2

Similarly:
x˛ + y˛ = 257˛
<=> x = +/- sqrt ( 257˛ - y˛)
When calculating the coordinates of C, we just have to chose one value pair among the two possible ones. Once we have done that, we can repeat the process, calculate the two possible values with respect to A and B, and then check with C which one make sense (the one where the distance to C will be closest to the original distance to C). Now, the point is that is that check is also a way to validate in part the data.

And it turns out that the data provided is unusable, because is is not consistent.

This is the test program I originally wrote:

```use strict;
use warnings;

my %coordinates;
\$coordinates{A} = [0,0];
\$coordinates{B} = [0,633];
\$coordinates{C} = find_coord(257, 390, undef);
\$coordinates{D} = find_coord(91, 661, 228);

print "@{\$coordinates{D}} \n";

sub find_coord {
my (\$a, \$b, \$c) = @_;
my \$sq_dist_dif = \$a*\$a - \$b*\$b;
my \$y = \$sq_dist_dif/(633*2) + 633/2;
my \$x = sqrt (\$a*\$a - \$y*\$y);
return [\$x, \$y] unless defined \$c;
my \$dist_2_c_1 = distance ([\$x, \$y], \$coordinates{C}, \$c);
my \$dist_2_c_2 = distance ([-\$x, \$y], \$coordinates{C}, \$c);
\$dist_2_c_1 < \$dist_2_c_2 ? [\$x, \$y, \$dist_2_c_1] : [-\$x, \$y, \$dis
+t_2_c_2];
}
sub distance {
return abs (sqrt ((\$_[0][0] - \$_[1][0])**2 + (\$_[0][1] - \$_[1][1])
+**2) - \$_[2]);
}
The distances to C did not make any sense compared to the rest. I tried to plot the distances on a piece of paper and found that I was going into a dead end as soon as I tried to plot D. The input data simply does not work.

So I tried to produce my own data and just plotted six points on a piece of paper, and measured their distance (in millimeters) to A, B and C. I updated my program to record these changes and got the following program:

```use strict;
use warnings;
use Data::Dumper;

my %coordinates;
\$coordinates{A} = [0,0];
\$coordinates{B} = [0,90];

while (<DATA>) {
my (\$site, \$dA, \$dB, \$dC) = /(\w):\s+(\d+)\s+(\d+)\s+(\d+)/;
\$coordinates{\$site} = find_coord (\$dA, \$dB, \$dC);
}
print Dumper \%coordinates;

sub find_coord {
my (\$a, \$b, \$c) = @_;
my \$sq_dist_dif = \$a*\$a - \$b*\$b;
my \$y = \$sq_dist_dif/(90*2) + 90/2;
my \$x = sqrt (\$a*\$a - \$y*\$y);
return [\$x, \$y] unless \$c;
my \$dist_2_c_1 = distance ([\$x, \$y], \$coordinates{C}, \$c);
my \$dist_2_c_2 = distance ([-\$x, \$y], \$coordinates{C}, \$c);
\$dist_2_c_1 < \$dist_2_c_2 ? [\$x, \$y, \$dist_2_c_1] : [-\$x, \$y, \$dis
+t_2_c_2];
}
sub distance {
return abs (sqrt ((\$_[0][0] - \$_[1][0])**2 + (\$_[0][1] - \$_[1][1])
+**2) - \$_[2]);
}

__DATA__
C: 64  76  0
D: 28  70  74
E: 120  62  68
F: 68  52  93
G: 53  125 64
For each point, the program displays the calculated x and y coordinates and the deviation between the input C distance and the calculated C distance. This is the output:
```\$ perl  coordinates2.pl
\$VAR1 = {
'F' => [
'-39.0540935398867',
'55.6666666666667',
'1.33876031467558'
],
'A' => [
0,
0
],
'D' => [
'-17.1497975368678',
'22.1333333333333',
'2.41895858461325'
],
'C' => [
'53.1402755816047',
'35.6666666666667'
],
'G' => [
'46.0712491690859',
'-26.2',
'1.73078144969754'
],
'E' => [
'60.4799895486306',
'103.644444444444',
'0.372872345124534'
],
'B' => [
0,
90
]
};
As it can be seen, the deviation on C is every time small enough, compared to the other values, to show that the calculations are most probably correct.

Just in case you wished to know, these are the original coordinates I used to plot on my points:

```A 0 0
B O 90
C 55 35
D -20 20
E 60 114
F 64 55
G 45 -29

Replies are listed 'Best First'.
Re^2: Data visualisation.
by BrowserUk (Pope) on Jan 04, 2014 at 09:09 UTC
And it turns out that the data provided is unusable, because is is not consistent.

Yes. Everyone has found anomalies with the dataset. The dataset is part of an on-line library of TSP datasets.

Their main claim to fame is that they (mostly) have known solutions, and are thus useful for testing algorithms and implementations. They do not guarantee that the datasets are consistent.

And neither did I. Indeed, determining that is a big part of the motivation for my OP.

This animated gif might be of interest.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Well, I thought that providing code that actually attempts to solve the problem and that shows that the data is not valid would be useful to you and to others (even though it is only a partial validation, since it is looking at distances only to A, B and C, not to other points, but it is easy to extend it if needed).

It was fun anyway to try to solve the issue, in a subject-area quite far away from what I am doing daily with Perl. I enjoyed it. Thank you for that.

I thought that providing code that actually attempts to solve the problem ... would be useful to you and to others

It is/will be.

... and that shows that the data is not valid ...

However, the data is valid.

There is nothing in the definition of the TSP that says that the "distances" involved have to be mappable to a flat 2D plane. They could be great circle distances from a sphere; or roads that follow coast and river and skirt hills. They don't even need to be symmetrical! (Flight distances from New York to London can be substantially shorter (or longer) than that from London to New York depending upon the direction & strength of the prevailing wind.)

The desire to visualise data doesn't go away when the data itself is inconvenient.

It was fun anyway to try to solve the issue, in a subject-area quite far away from what I am doing daily with Perl. I enjoyed it. Thank you for that.

I'm glad for that; and for your help :)

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Create A New User
Node Status?
node history
Node Type: note [id://1069219]
help
Chatterbox?
 [choroba]: Check the ninja on perlbrew.pl [chacham]: oh my... [Corion]: choroba++ # fixed width marto has a Dirty_Mind on vinyl [Corion]: Heh. The upside of this phone call was at least commendations from my cow-orkers on how clam I stayed and how I always brought the topic back to my priorities and the established processes of \$work (them only getting half of the conversation).

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (10)
As of 2017-08-17 13:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
Who is your favorite scientist and why?

Results (287 votes). Check out past polls.

Notices?