Data visualisation.

BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Data visualisation. (updated) by LanX (Saint) on Jan 02, 2014 at 11:23 UTC
I suppose this is a follow-up to the Travelling Salesman discussion we had. Are you even sure that the data is euclidian ~~or even metric~~, i.e. projectable into a plane??? In general thats not solvable, if not!!! (first imagine the problems from projecting distances from a spheric surface like the globe (which is still possible) and then think about randomly generated distances...) Algorithm derived from school geometry If they are ~~metric~~ euclidean you can start choosing freely point A, then B in a circle surrounding A, then C on one of the two crosspoints of the circles surrounding A and B.š From there on all other points are determined by the distances from A,B and C.ł After this you just need to transform the points (moving, rotating) to fit into a desired window.˛ Then you are free to plot with the technology of your liking (Tk, SVG, graphviz,...) HTH! Cheers Rolf ( addicted to the Perl Programming Language) updates š) Thats how a triangle is constructed with a pair of compasses, if only the length of three sides are given. Surprisingly I couldn't find an image in the net describing this. (update: watch animation here) ˛) see Clipping_(computer_graphics) ł) and at least now it should be obvious why random distances to other points will fail.	[reply]
Re^2: Data visualisation. (updated) by roboticus (Chancellor) on Jan 02, 2014 at 13:00 UTC
LanX: Update 2: I've got an updated program in Re: Data visualisation.. Update: I was thinking you meant TSP wasn't solveable. I didn't read this thread before replying, I just picked the subject out of the list and thought it was a discussion on that. I'm curious about your assertion that the general case isn't solveable in the general case if the data doesn't correspond to a planar map. I just don't see how that follows. Can you elaborate on that? Anyway, I don't believe the data was planar. A few days ago, I was twiddling with the problem, and I wanted to see the graph, so I decided to graph it and find the point coordinates (modulo translation and rotation). My test/demo data would always come up fine, but the TSP data wouldn't resolve. If time and interest permits, I'm thinking of a way to integrate force-driven layout into the code to try to get a reasonable layout with errors and residual "forces" along each edge listed. I haven't started that yet. But just in case anyone wants to play with the code, I'm including it in readmore tags below. Read more... (13 kB) ...roboticus When your only tool is a hammer, all problems look like your thumb.	[reply] [d/l] [select]
Re^3: Data visualisation. (updated) by LanX (Saint) on Jan 02, 2014 at 13:17 UTC
> I'm curious about your assertion that the general case isn't solveable in the general case if the data doesn't correspond to a planar map. It depends on the criteria you have on "displayable". If it's defined such that each edge-length has to be congruent to the distance in the matrix, you can simply follow the algorithm I sketched to see how each point's position is determined by the distance to 3 other points. I.e. the distance to all other points can't be randomly chosen. Since I had differential geometry at university I know that there are "distance-true"š projections originating from non-plane spaces ~~like spheres~~˛. This is no contradiction, since this data still has to fit into aforementioned algorithm. The projection won't be "angle"š or "surface"š-true, which is a problem for maps but not for graphs. I.a.W. the origin of the data doesn't need to be from a planar geometry but the graph needs to be ~~planar~~ euclidean! I will try to update some WP links... Cheers Rolf ( addicted to the Perl Programming Language) š) not sure about the appropriate English terminology. ˛) remembered it wrong sorry, see Map_projection#Metric_properties_of_maps, and in hindsight it's obvious that projections can only preserve distances if the Pythagorean theorem holds. Though preserving size of areas and angles are no problem.	[reply]
A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Data visualisation. by davies (Prior) on Jan 02, 2014 at 11:48 UTC
There are two possibilities: the set is known to be mappable or it is not. Start with point A. Place it in a predefined position in your space. The space will need to be extensible in all directions (or there must be a procedure to move all existing points to make room for a new one). Point B is then placed 633 units from A in a predefined direction. There are now at most 2 points where C can go (257 from A and 390 from B). Choose one of these by a predetermined process. The same process for C will apply to all subsequent points. This is where the question of known mappability comes in. If a set is known to be mappable but a point appears to be unmappable, use a least squares algorithm to find the most appropriate place for it. If the set might not be mappable, halt and report the problem. This is my first pass approach to the algorithm. I'll continue to think about it & may respond further. Regards, John Davies	[reply]
Re: Data visualisation. by roboticus (Chancellor) on Jan 03, 2014 at 02:35 UTC
BrowserUk: I think I have my code in fairly decent shape, functionality-wise. (It's still ugly right now, though.) However, to get most of the points in your dataset located, I had to open up the tolerance to accept up to a 15% error. (It misses only 3 points at that setting.) Do you have a way to check whether the coordinates look reasonable against your original data? I'm curious at how good the program works. I've still got some testing to do, but so far, it's been giving decent results with my test sets. I've uglified the code a bit more to draw the coordinates in a bitmap so I could do a visual check with my original coordinate sets, and it comes out good on them (three different point sets). Code in readmore tags... Read more... (16 kB) ...roboticus When your only tool is a hammer, all problems look like your thumb.	[reply] [d/l]
Re^2: Data visualisation. by BrowserUk (Patriarch) on Jan 04, 2014 at 08:44 UTC
Do you have a way to check whether the coordinates look reasonable against your original data? I'm curious at how good the program works. Sorry for the delay in getting back to you Roboticus, but I was caught up with my own efforts. I tried C&ping my dataset into your code but something broke and I couldn't immediately see what. I finally just got back to it and it was simply that you use `split /\s+/` rather than `split ' '` and my data had leading whitespace which screwed everything up. Fixed that, ran it and got: <0>: (600.804026845638, 199.30760478734) <A>: (0, 0) <2>: undef <3>: (628.038255033557, 206.128479872154) <4>: undef <5>: (443.404697986577, 203.804498977408) <6>: (518.110738255034, 242.374220792608) <7>: (479.277852348993, 226.258127473328) <8>: (551.842953020134, 59.1130713295962) <C>: (113.638255033557, 265.720430138385) <10>: (228.714765100671, 164.965318248851) <11>: (629.140939597315, 105.9513007122) <12>: (532.210067114094, 195.554198273552) <13>: undef <14>: (334.319463087248, 254.225287099954) <B>: (745, 0) <16>: (476.814093959732, 202.416204394214) [download] Those undefs are the points you had trouble with I assume. Everyone has found problems with the dataset -- it comes from TSPLIB and there was never any guarantee that it was a plottable dataset; that's part of the reason for whating to try and visualise it. However, everyone seems to have problems with different points. I eventually gave up with the Law of Cosines method. Not only are there four quadrants that each point could be in, there are two points (+-y) that need to be considered. Instead I went the Circle-Circle intersection route, which proved to be far simpler. This animated gif shows the problem with the dataset quite nicely. The green arcs are the distances from B & P. The cyan arc is from the 3rd point (in this case the A point), which is used to decide (nearest wins) which of the two intersect points of the green arcs is chosen. It also highlights the accuracy or lack thereof of the correspondence between them. It shows that -- with A as the third point -- D is a particularly bad fit; but chose a different 3rd reference point and D might be spot on but some other previously good fit point becomes bad. Anyway, many thanks for your code -- you're first attempt was the cluebat I needed to get started. I've added my code below, but it is also very obfuscated with the image generation code. I need to separate the two, but it was very useful when coding. My code: #! perl -slw use strict; use Data::Dump qw[ pp ]; $Data::Dump::WIDTH = 200; use List::Util qw[ min max sum ]; use GD; use enum qw[ X Y ]; use constant PI => 3.1415926535897932384626433832795; my @N2A; @N2A[ 0 .. 25 ] = 'A'..'Z'; sub acos { atan2( sqrt( 1 - $_[0] * $_[0] ), $_[0] ) } sub asin { atan2( $_[0], sqrt( 1 - $_[0] * $_[0] ) ) } sub rgb2n{ local $^W; unpack 'N', pack 'CCCC', 0, @_ } my $BLACK = rgb2n( 0,0,0 ); my $RED = rgb2n( 255, 0, 0 ); my $GREEN = rgb2n( 0, 255, 0 ); my $BLUE = rgb2n( 0, 0, 255 ); my $YELLOW = rgb2n( 255, 255, 0 ); my $MAGENTA = rgb2n( 255, 0, 255 ); my $CYAN = rgb2n( 0, 255, 255 ); my $WHITE = rgb2n( 255,255,255 ); my( $xOrg, $yOrg, @pts, @dists ); sub plotPt{ my( $im, $pt, $label ) = @_; $im->filledArc( $pt->[X] +$xOrg, $pt->[Y] +$yOrg, 14, +14, 0, 360, $RED ); $im->string( gdSmallFont, $pt->[X]-1+$xOrg, $pt->[Y]-7+$yOrg, $lab +el, $BLACK ) } sub plotRoute{ my( $im, $pt1, $pt2 ) = @_; $im->line( $pt1->[X]+$xOrg, $pt1->[Y]+$yOrg, $pt2->[X]+$xOrg, $pt2 +->[Y]+$yOrg, $BLUE ); } sub plotArc { my( $im, $p1, $p2, $color ) = @_; $im->arc( $pts[ $p1 ][X]+$xOrg, $pts[ $p1 ][Y]+$yOrg, ( $dists[ $p1 ][ $p2 ]2 )x2, 0,360, $color ); } @dists = map[ split ' ' ], <DATA>; shift @dists; shift @$_ for @dists; sub d{ $dists[ $_[0] ][ $_[1] ] } my $dMax = max( map max( @$_ ), @dists ); my $xMax = 200+$dMax; my $yMax = 40+sqrt( $dMax2 - ( $dMax / 2 )2 ) 2; ( $xOrg, $yOrg ) = ( 100, $yMax / 2 ); my $im = GD::Image->new( $xMax, $yMax, 1 ); $im->fill( 0,0, $WHITE ); $im->line( $xOrg, 0, $xOrg, $yMax, $BLACK ); $im->line( 0, $yOrg, $xMax, $yOrg, $BLACK ); my( $p1, $p2 ) = map{ my $y = $_; map{ $dists[$y][$_] == $dMax ? ( $_, $y ) : () } 0 .. $#dists; } 0 .. $#dists; print "$p1, $p2"; $pts[ $p2 ] = [ 0, 0]; $pts[ $p1 ] = [ $dMax, 0 ]; $pts[ 0 ] = do { my( $d, $r, $R ) = ( $dMax, d( $p1, 0 ), d( $p2, 0 ) ); my $x = ( $d2 - $r2 + $R*2 ) / ( 2 $d ); my $y = ( 1/$d * sqrt( (-$d+$r-$R)(-$d-$r+$R)(-$d+$r+$R)($d+$r+ +$R) ) ) / 2; [ $x, $y ]; }; plotPt( $im, $pts[ 0 ], $N2A[ 0 ] ); plotRoute( $im, @pts[ $p1, $p2 ] ); plotPt( $im, $pts[ $p1 ], $N2A[ $p1 ] ); plotPt( $im, $pts[ $p2 ], $N2A[ $p2 ] ); my $ani = $im->gifanimbegin( 1, 10 ); for my $p ( 1 .. $#dists ) { next if $p == $p1 or $p == $p2 ; my( $d, $r, $R ) = ( $dMax, d( $p1, $p ), d( $p2, $p ) ); my $x = ( $d2 - $r2 + $R2 ) / ( 2 $d ); my $y = ( 1/$d * sqrt( (-$d+$r-$R)(-$d-$r+$R)(-$d+$r+$R)($d+$r+ +$R) ) ) / 2; plotArc( $im, $p1, $p, $GREEN ); plotArc( $im, $p2, $p, $GREEN ); plotArc( $im, 0, $p, $CYAN ); my $checkD1 = sqrt( ( $pts[0][X] - $x )2 + ( $pts[0][Y] - $y )* +2 ); my $checkD2 = sqrt( ( $pts[0][X] - $x )2 + ( $pts[0][Y] + $y ) +2 ); printf "$N2A[ $p ]: %u $checkD1 $checkD2\n", d( 0, $p ); $pts[ $p ] = [ $x, ( abs( $checkD1 - d( 0, $p ) ) < abs( $checkD2 +- d( 0, $p ) ) ) ? $y : -$y ]; plotPt( $im, $pts[ $p ], $N2A[ $p ] ); $ani .= $im->gifanimadd( 1, 0, 0, 300, ); open PNG, '>:raw', "$0.png" or die $!; print PNG $im->png; close PNG; system "$0.png"; plotArc( $im, $p1, $p, $WHITE ); plotArc( $im, $p2, $p, $WHITE ); plotArc( $im, 0, $p, $WHITE ); } $ani .= $im->gifanimend(); open GIF, '>:raw', "$0.gif" or die $!; print GIF $ani; close GIF; system "$0.gif"; my @route = ( 0, 15, 11, 8, 3, 12, 6, 7, 5, 2, 10, 4, 1, 9, 14, 13, 16 + ); plotRoute( $im, @pts[ @route[ $_-1, $_ ] ] ) for 1..$#route; plotPt( $im, $pts[ $_ ], $N2A[ $_ ] ) for 0 .. $#pts; open PNG, '>:raw', "$0.png" or die $!; print PNG $im->png; close PNG; system "$0.png"; print ' ', join ' ' x 26, 'B' .. 'Q'; for my $i ( 0 .. $#dists ) { printf "$N2A[ $i ]: "; for my $j ( $i+1 .. $#dists ) { my $checkD1 = sqrt( ( $pts[$i][X] - $pts[$j][X] )2 + ( $pts[ +$i][Y] - $pts[$j][Y] )2 ); my $checkD2 = sqrt( ( $pts[$i][X] - $pts[$j][X] )2 + ( $pts[ +$i][Y] + $pts[$j][Y] )2 ); my $checkD3 = sqrt( ( $pts[$i][X] + $pts[$j][X] )2 + ( $pts[ +$i][Y] - $pts[$j][Y] )2 ); my $checkD4 = sqrt( ( $pts[$i][X] + $pts[$j][X] )2 + ( $pts[ +$i][Y] + $pts[$j][Y] )2 ); printf "%4u (%4.f %4.f %4.f %4.f) ", d( $i, $j ), $checkD1, $c +heckD2, $checkD3, $checkD4; } print ''; } __DATA__ A B C D E F G H I J K L M +N O P Q A: 0 633 257 91 412 150 80 134 259 505 353 324 70 21 +1 268 246 121 B: 633 0 390 661 227 488 572 530 555 289 282 638 567 46 +6 420 745 518 C: 257 390 0 228 169 112 196 154 372 262 110 437 191 7 +4 53 472 142 D: 91 661 228 0 383 120 77 105 175 476 324 240 27 18 +2 239 237 84 E: 412 227 169 383 0 267 351 309 338 196 61 421 346 24 +3 199 528 297 F: 150 488 112 120 267 0 63 34 264 360 208 329 83 10 +5 123 364 35 G: 80 572 196 77 351 63 0 29 232 444 292 297 47 15 +0 207 332 29 H: 134 530 154 105 309 34 29 0 249 402 250 314 68 10 +8 165 349 36 I: 259 555 372 175 338 264 232 249 0 495 352 95 189 32 +6 383 202 236 J: 505 289 262 476 196 360 444 402 495 0 154 578 439 33 +6 240 685 390 K: 353 282 110 324 61 208 292 250 352 154 0 435 287 18 +4 140 542 238 L: 324 638 437 240 421 329 297 314 95 578 435 0 254 39 +1 448 157 301 M: 70 567 191 27 346 83 47 68 189 439 287 254 0 14 +5 202 289 55 N: 211 466 74 182 243 105 150 108 326 336 184 391 145 +0 57 426 96 O: 268 420 53 239 199 123 207 165 383 240 140 448 202 5 +7 0 483 153 P: 246 745 472 237 528 364 332 349 202 685 542 157 289 42 +6 483 0 336 Q: 121 518 142 84 297 35 29 36 236 390 238 301 55 9 +6 153 336 0 [download] With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re: Data visualisation. by roboticus (Chancellor) on Jan 02, 2014 at 13:08 UTC
BrowserUk: I've been thinking along those lines myself, and coded up a distance-matrix to coordinates program (current code in Re^2: Data visualisation. (updated)). However, the data doesn't seem to map into a planar representation. I'm thinking about (but haven't started) mixing in force-directed layout to build a planar mapping listing the stress forces along the edges and force vector from each node to approximate a 2D layout. ...roboticus When your only tool is a hammer, all problems look like your thumb.	[reply]
Re^2: Data visualisation. by LanX (Saint) on Jan 02, 2014 at 13:54 UTC
Hi roboticus Did you check if the data conforms to the Triangle_inequality ? If not I doubt that any sane visualization is possible. update well at least with straight lines... ... but I fear with curved lines it might always be possible but not usable! Cheers Rolf ( addicted to the Perl Programming Language)	[reply]
Re^3: Data visualisation. by roboticus (Chancellor) on Jan 02, 2014 at 15:14 UTC
LanX: No, I didn't think to do that. It's a trivial check, so when I get home, I'll try to remember to add that. (Unfortunately, the security restrictions at work make it extremely difficult to actually do any coding.) Update: I've now added the triangle inequality check in the version at Re: Data visualisation.. Update 2: There was an error in my triangle inequality check (it didn't affect my original results, because my code didn't use any of the invalid triangles--by luck, not design). The error was that I only made one of the three checks required. The new check function is: sub chk_triangle_inequality { # Get the points my ($A, $B, $C) = @_; # Calc the distances my ($AB, $AC, $BC) = (dist_2($A, $B), dist_2($A, $C), dist_2($B, $ +C)); if ($AB > $AC + $BC) { print "ERROR: Triangle inequality fails for points ($A, $B, $C +) ($AB > $AC+$BC)\n"; } elsif ($AC > $AB + $BC) { print "ERROR: Triangle inequality fails for points ($A, $B, $C +) ($AC > $AB+$BC)\n"; } elsif ($BC > $AB + $AC) { print "ERROR: Triangle inequality fails for points ($A, $B, $C +) ($BC > $AB+$AC)\n"; } } [download] I also added an exhaustive check at the end: `print <<EOHDR; ##### # Exhaustive triangle inequality check ##### EOHDR for my $A (0 .. ($#LOCs-2) ) { for my $B ( ($A+1) .. ($#LOCs-1) ) { for my $C ( ($B+1) .. $#LOCs ) { chk_triangle_inequality($A, $B, $C); } } }` [download] The exhaustive check gives the following misses: ##### # Exhaustive triangle inequality check ##### ERROR: Triangle inequality fails for points (0, 5, 6) (150 > 80+63) ERROR: Triangle inequality fails for points (0, 6, 7) (134 > 80+29) ERROR: Triangle inequality fails for points (0, 6, 15) (332 > 80+246) ERROR: Triangle inequality fails for points (0, 6, 16) (121 > 80+29) ERROR: Triangle inequality fails for points (1, 2, 3) (661 > 390+228) ERROR: Triangle inequality fails for points (1, 2, 13) (466 > 390+74) ERROR: Triangle inequality fails for points (1, 3, 4) (661 > 227+383) ERROR: Triangle inequality fails for points (1, 3, 5) (661 > 488+120) ERROR: Triangle inequality fails for points (1, 3, 6) (661 > 572+77) ERROR: Triangle inequality fails for points (1, 3, 7) (661 > 530+105) ERROR: Triangle inequality fails for points (1, 3, 10) (661 > 282+324) ERROR: Triangle inequality fails for points (1, 3, 12) (661 > 567+27) ERROR: Triangle inequality fails for points (1, 3, 13) (661 > 466+182) ERROR: Triangle inequality fails for points (1, 3, 14) (661 > 420+239) ERROR: Triangle inequality fails for points (1, 3, 16) (661 > 518+84) ERROR: Triangle inequality fails for points (1, 5, 6) (572 > 488+63) ERROR: Triangle inequality fails for points (1, 5, 7) (530 > 488+34) ERROR: Triangle inequality fails for points (1, 6, 7) (572 > 530+29) ERROR: Triangle inequality fails for points (1, 6, 16) (572 > 518+29) ERROR: Triangle inequality fails for points (2, 3, 12) (228 > 191+27) ERROR: Triangle inequality fails for points (2, 3, 15) (472 > 228+237) ERROR: Triangle inequality fails for points (2, 3, 16) (228 > 142+84) ERROR: Triangle inequality fails for points (2, 5, 6) (196 > 112+63) ERROR: Triangle inequality fails for points (2, 5, 7) (154 > 112+34) ERROR: Triangle inequality fails for points (2, 6, 7) (196 > 154+29) ERROR: Triangle inequality fails for points (2, 6, 16) (196 > 142+29) ERROR: Triangle inequality fails for points (3, 4, 12) (383 > 27+346) ERROR: Triangle inequality fails for points (3, 4, 16) (383 > 84+297) ERROR: Triangle inequality fails for points (3, 5, 12) (120 > 27+83) ERROR: Triangle inequality fails for points (3, 5, 15) (364 > 120+237) ERROR: Triangle inequality fails for points (3, 5, 16) (120 > 84+35) ERROR: Triangle inequality fails for points (3, 6, 12) (77 > 27+47) ERROR: Triangle inequality fails for points (3, 6, 15) (332 > 77+237) ERROR: Triangle inequality fails for points (3, 7, 12) (105 > 27+68) ERROR: Triangle inequality fails for points (3, 7, 15) (349 > 105+237) ERROR: Triangle inequality fails for points (3, 9, 12) (476 > 27+439) ERROR: Triangle inequality fails for points (3, 9, 16) (476 > 84+390) ERROR: Triangle inequality fails for points (3, 10, 12) (324 > 27+287) ERROR: Triangle inequality fails for points (3, 10, 16) (324 > 84+238) ERROR: Triangle inequality fails for points (3, 12, 13) (182 > 27+145) ERROR: Triangle inequality fails for points (3, 12, 14) (239 > 27+202) ERROR: Triangle inequality fails for points (3, 12, 15) (289 > 27+237) ERROR: Triangle inequality fails for points (3, 12, 16) (84 > 27+55) ERROR: Triangle inequality fails for points (3, 13, 15) (426 > 182+237 +) ERROR: Triangle inequality fails for points (3, 13, 16) (182 > 84+96) ERROR: Triangle inequality fails for points (3, 14, 15) (483 > 239+237 +) ERROR: Triangle inequality fails for points (3, 14, 16) (239 > 84+153) ERROR: Triangle inequality fails for points (3, 15, 16) (336 > 237+84) ERROR: Triangle inequality fails for points (4, 5, 6) (351 > 267+63) ERROR: Triangle inequality fails for points (4, 5, 7) (309 > 267+34) ERROR: Triangle inequality fails for points (4, 6, 7) (351 > 309+29) ERROR: Triangle inequality fails for points (4, 6, 16) (351 > 297+29) ERROR: Triangle inequality fails for points (5, 6, 9) (444 > 63+360) ERROR: Triangle inequality fails for points (5, 6, 10) (292 > 63+208) ERROR: Triangle inequality fails for points (5, 6, 14) (207 > 63+123) ERROR: Triangle inequality fails for points (5, 7, 9) (402 > 34+360) ERROR: Triangle inequality fails for points (5, 7, 10) (250 > 34+208) ERROR: Triangle inequality fails for points (5, 7, 14) (165 > 34+123) ERROR: Triangle inequality fails for points (6, 7, 9) (444 > 29+402) ERROR: Triangle inequality fails for points (6, 7, 10) (292 > 29+250) ERROR: Triangle inequality fails for points (6, 7, 13) (150 > 29+108) ERROR: Triangle inequality fails for points (6, 7, 14) (207 > 29+165) ERROR: Triangle inequality fails for points (6, 9, 16) (444 > 29+390) ERROR: Triangle inequality fails for points (6, 10, 16) (292 > 29+238) ERROR: Triangle inequality fails for points (6, 13, 16) (150 > 29+96) ERROR: Triangle inequality fails for points (6, 14, 16) (207 > 29+153) ERROR: Triangle inequality fails for points (9, 13, 14) (336 > 240+57) [download] ...roboticus When your only tool is a hammer, all problems look like your thumb.	[reply] [d/l] [select]
Re^4: Data visualisation. by LanX (Saint) on Jan 04, 2014 at 09:38 UTC
Re^5: Data visualisation. by roboticus (Chancellor) on Jan 04, 2014 at 21:44 UTC
Some notes below your chosen depth have not been shown here
A reply falls below the community's threshold of quality. You may see it by logging in.
Re^4: Data visualisation. by LanX (Saint) on Jan 06, 2014 at 10:22 UTC
Re^2: Data visualisation. by BrowserUk (Patriarch) on Jan 02, 2014 at 14:02 UTC
coded up a distance-matrix to coordinates program Thanks. That's very helpful. However, the data doesn't seem to map into a planar representation. I assume you mean the OP data in the TSP thread. This data has nothing to do with that data. (Though it was the trigger for my current interest.) My first postulate: the maximum size of the plane required is: xMax = yMax = 1.23 * sqrt( max_distance² / 2 ). With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^3: Data visualisation. by roboticus (Chancellor) on Jan 02, 2014 at 15:23 UTC
BrowserUk: I'm glad you found it useful. The problem was interesting, so I kept playing around with it. And yes, I was referring to the OP data in the TSP thread--I haven't had a chance to check out your distance data yet. In my initial replies, I read my browser tabs out of order and rather than noticing that this was a new thread, I thought I was working in that thread, so I probably made a few mistakes in that regard. Another possibility I've been thinking about is to add a Z coordinate to see if the data could be made consistent that way. (Again referring to OP TSP data.) But if the data represents miles travelled (i.e. along the road, rather than as the crow flies), adding a single degree of freedom is probably insufficient. ...roboticus When your only tool is a hammer, all problems look like your thumb.	[reply]
Re^4: Data visualisation. by BrowserUk (Patriarch) on Jan 02, 2014 at 16:54 UTC
Re: Data visualisation. by lidden (Curate) on Jan 02, 2014 at 11:55 UTC
Not sure this is a viable approach. Start with A at (0,0) and B at (0,633). If you put a circle with radius 257 at A and another with radius 390 at B, then C will be at the intersection of these circles. This gives two points chose one. The map will be mirrored depending on the choice. The rest of the points can be found with more circle intersections. It can be done that way with paper and compass. Writing a program is not my problem ;-)	[reply]
Re: Data visualisation. by Laurent_R (Canon) on Jan 04, 2014 at 00:48 UTC
Alright, an update on the strategy I proposed earlier. The calculation is quite simple, no need for trigonometry, simple application of the Pythagorean theorem. Real code is less than 20 lines. First a careful analysis of the problem: Let the coordinates of A be (0, 0) and B (0, 633). C is 257 someunits +(let's say meters, we don't care) from A and 390 m from B. Distance A-C: sqrt (x˛ + y˛) = 257 <=> x˛ + y˛ = 257˛ + (1) Distance B-C: sqrt (x˛ + (633 - y)˛) = 390 <=> x˛ + (633 - y)˛ = 390 +˛ (2) Substrating 2 from 1: y˛ - (633 - y)˛ = 257˛ - 390˛ <=> (y + a - y)(y - a + y) = 257˛ - 390˛ <=> a (2 y - a) = 257˛ - 390˛ <=> y = (257˛ - 390˛) / 2a + a/2 Similarly: x˛ + y˛ = 257˛ <=> x = +/- sqrt ( 257˛ - y˛) [download] When calculating the coordinates of C, we just have to chose one value pair among the two possible ones. Once we have done that, we can repeat the process, calculate the two possible values with respect to A and B, and then check with C which one make sense (the one where the distance to C will be closest to the original distance to C). Now, the point is that is that check is also a way to validate in part the data. And it turns out that the data provided is unusable, because is is not consistent. This is the test program I originally wrote: use strict; use warnings; my %coordinates; $coordinates{A} = [0,0]; $coordinates{B} = [0,633]; $coordinates{C} = find_coord(257, 390, undef); $coordinates{D} = find_coord(91, 661, 228); print "@{$coordinates{D}} \n"; sub find_coord { my ($a, $b, $c) = @_; my $sq_dist_dif = $a$a - $b$b; my $y = $sq_dist_dif/(6332) + 633/2; my $x = sqrt ($a$a - $y$y); return [$x, $y] unless defined $c; my $dist_2_c_1 = distance ([$x, $y], $coordinates{C}, $c); my $dist_2_c_2 = distance ([-$x, $y], $coordinates{C}, $c); $dist_2_c_1 < $dist_2_c_2 ? [$x, $y, $dist_2_c_1] : [-$x, $y, $dis +t_2_c_2]; } sub distance { return abs (sqrt (($_[0][0] - $_[1][0])2 + ($_[0][1] - $_[1][1]) +2) - $_[2]); } [download] The distances to C did not make any sense compared to the rest. I tried to plot the distances on a piece of paper and found that I was going into a dead end as soon as I tried to plot D. The input data simply does not work. So I tried to produce my own data and just plotted six points on a piece of paper, and measured their distance (in millimeters) to A, B and C. I updated my program to record these changes and got the following program: use strict; use warnings; use Data::Dumper; my %coordinates; $coordinates{A} = [0,0]; $coordinates{B} = [0,90]; while (<DATA>) { my ($site, $dA, $dB, $dC) = /(\w):\s+(\d+)\s+(\d+)\s+(\d+)/; $coordinates{$site} = find_coord ($dA, $dB, $dC); } print Dumper \%coordinates; sub find_coord { my ($a, $b, $c) = @_; my $sq_dist_dif = $a$a - $b$b; my $y = $sq_dist_dif/(902) + 90/2; my $x = sqrt ($a$a - $y$y); return [$x, $y] unless $c; my $dist_2_c_1 = distance ([$x, $y], $coordinates{C}, $c); my $dist_2_c_2 = distance ([-$x, $y], $coordinates{C}, $c); $dist_2_c_1 < $dist_2_c_2 ? [$x, $y, $dist_2_c_1] : [-$x, $y, $dis +t_2_c_2]; } sub distance { return abs (sqrt (($_[0][0] - $_[1][0])2 + ($_[0][1] - $_[1][1]) +2) - $_[2]); } __DATA__ C: 64 76 0 D: 28 70 74 E: 120 62 68 F: 68 52 93 G: 53 125 64 [download] For each point, the program displays the calculated x and y coordinates and the deviation between the input C distance and the calculated C distance. This is the output: `$ perl coordinates2.pl $VAR1 = { 'F' => [ '-39.0540935398867', '55.6666666666667', '1.33876031467558' ], 'A' => [ 0, 0 ], 'D' => [ '-17.1497975368678', '22.1333333333333', '2.41895858461325' ], 'C' => [ '53.1402755816047', '35.6666666666667' ], 'G' => [ '46.0712491690859', '-26.2', '1.73078144969754' ], 'E' => [ '60.4799895486306', '103.644444444444', '0.372872345124534' ], 'B' => [ 0, 90 ] };` [download] As it can be seen, the deviation on C is every time small enough, compared to the other values, to show that the calculations are most probably correct. Just in case you wished to know, these are the original coordinates I used to plot on my points: `A 0 0 B O 90 C 55 35 D -20 20 E 60 114 F 64 55 G 45 -29` [download]	[reply] [d/l] [select]
Re^2: Data visualisation. by BrowserUk (Patriarch) on Jan 04, 2014 at 09:09 UTC
And it turns out that the data provided is unusable, because is is not consistent. Yes. Everyone has found anomalies with the dataset. The dataset is part of an on-line library of TSP datasets. Their main claim to fame is that they (mostly) have known solutions, and are thus useful for testing algorithms and implementations. They do not guarantee that the datasets are consistent. And neither did I. Indeed, determining that is a big part of the motivation for my OP. This animated gif might be of interest. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^3: Data visualisation. by Laurent_R (Canon) on Jan 04, 2014 at 10:53 UTC
Well, I thought that providing code that actually attempts to solve the problem and that shows that the data is not valid would be useful to you and to others (even though it is only a partial validation, since it is looking at distances only to A, B and C, not to other points, but it is easy to extend it if needed). It was fun anyway to try to solve the issue, in a subject-area quite far away from what I am doing daily with Perl. I enjoyed it. Thank you for that.	[reply]
Re^4: Data visualisation. by BrowserUk (Patriarch) on Jan 04, 2014 at 11:44 UTC
Re^5: Data visualisation. by Laurent_R (Canon) on Jan 04, 2014 at 12:38 UTC
Some notes below your chosen depth have not been shown here
Re: Data visualisation. by davies (Prior) on Jan 03, 2014 at 19:48 UTC
OK, I have some working code. It's merely intended to demonstrate the algorithm, so I've done it in Excel without all the usual clever dickery I usually use in Excel. The spreadsheet, including the resulting chart, is at https://gitorious.org/visualisation/visualisation/source/108fdc930d57ea0aeda62fe8dc741bf95e40c28e:. What I don't have is working data. If you consider your points B, C & D, B to D is 661, while BC is 390 and CD 228. So the total from B to D via C is 618, less than the BD distance. This involves an imaginative one way system that I can't visualise. Either that, or the crow is flying into some strong headwinds. :-) The code isn't intended to do anything clever like check data validity. If you run it with all your data, it will produce a visualisation as I will explain below. However, if run with just A, B, C and D, it will crash as it tries to find the square root of a negative number. The algorithm works as follows. First it finds the largest single distance, in this case BP. Then it transforms the matrix so that this value is at the top left. It assumes a north-south line between the two. Then it adds the third point, A, using triangle calculations to work out how far to the east to put it. It then loops through the rest of the lines, calculating the X and Y co-ordinates in the same way, but before placing the point, it tests whether the fit with the third point (A) would be better if it were to the left or the right of the original line. This is how it gets around the problems described above - it doesn't use all the data, just the relationships with B, P and A. Then it draws the graph. Because it can't do that very well and Messware won't add the necessary functionality (I haven't looked at 2010 or 2013), marking the points sensibly is a complicated business. If you want to run my macros, delete the "Chart26" and "Co-ordinates" sheets first. I haven't checked spreadsheet or data integrity in any way. Regards, John Davies Update: corrected URL that was pointing to an out of date commit.	[reply]
Re^2: Data visualisation. by BrowserUk (Patriarch) on Jan 04, 2014 at 09:00 UTC
I've done it in Excel Thanks for that. Unfortunately, don't have anything installed at the moment that let's me open Excel files. What I don't have is working data. If you consider your points B, C & D, B to D is 661, while BC is 390 and CD 228. So the total from B to D via C is 618, less than the BD distance. This involves an imaginative one way system that I can't visualise. Either that, or the crow is flying into some strong headwinds. :-) The dataset comes from TSPLIB(gr_17.tsp) and they make no guarantees (nor even statements) about the plot-ability of the sets. That's a big part of the motivation for wanting to try and visualise them. The algorithm works as follows. First it finds the largest single distance, in this case BP. Then it transforms the matrix so that this value is at the top left. It assumes a north-south line between the two. Then it adds the third point, A, using triangle calculations to work out how far to the east to put it. It then loops through the rest of the lines, calculating the X and Y co-ordinates in the same way, but before placing the point, it tests whether the fit with the third point (A) would be better if it were to the left or the right of the original line. This is how it gets around the problems described above - it doesn't use all the data, just the relationships with B, P and A. That sounds very similar to the approach I used except I put the baseline(B-P) horizontally. This is what the plotting process looks like. And the code that produces that is here. I hope you had as much fun with the problem as I did :) With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^3: Data visualisation. by berends (Novice) on Jan 04, 2014 at 13:38 UTC
LibreOffice Calc can open the spreadsheet and show the code in the macros. Gnumeric opens the spreadsheet file, shows the sheets containing the input data, a chart (XY plot) of the calculated results and a table of results (coordinates). The labelling of the charts is wrong in both Calc and Gnumeric but the data points look correctly positioned.	[reply]
Re: Data visualisation. by Laurent_R (Canon) on Jan 02, 2014 at 19:04 UTC
Hi, what about something along these lines. Set A to coordinates (0,0) and B to coordinates (0, 633). Say C has coordinates (x, y). You have: `sqrt (x˛ + y˛) = 257 sqrt(x˛ + (633 - y)˛) = 390` [download] Solving this system of two equations is straight forward. (Hoping I did not goof the maths ;-) For the next points, one needs to be a bit more general and use the distances to A, B and C.	[reply] [d/l]
Re: Data visualisation. by Anonymous Monk on Jan 02, 2014 at 20:46 UTC
You can pick some starting coordinates and then iteratively scoot the points around until you (hopefully) get a reasonable solution. I came up with this set of coordinates, which isn't too bad. `A 125.5 -13.0 B 5.8 622.6 C 170.2 250.3 D 67.6 28.0 E 54.9 392.3 F 128.8 137.6 G 126.6 63.5 H 137.6 100.2 I -115.1 83.3 J 219.8 483.8 K 117.0 352.0 L -176.7 28.1 M 86.2 63.7 N 189.9 182.1 O 207.5 242.4 P -124.3 -117.7 Q 116.8 109.2` [download]	[reply] [d/l]
Re^2: Data visualisation. by BrowserUk (Patriarch) on Jan 02, 2014 at 22:05 UTC
iteratively scoot the points I searched CPAN & google for a "scoot"ing algorithm without luck. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]
Re^3: Data visualisation. by Anonymous Monk on Jan 02, 2014 at 23:48 UTC
Heh. That's not a technical term. You can do something like this: `for $i (0 .. $npts-1) { for $j ($i+1 .. $npts-1) { $dx = $x[$j] - $x[$i]; $dy = $y[$j] - $y[$i]; $d = sqrt($dx$dx + $dy$dy); $f = $d - $distance[$i][$j]; $fx[$i] += $f * $dx / $d; $fy[$i] += $f * $dy / $d; $fx[$j] -= $f * $dx / $d; $fy[$j] -= $f * $dy / $d; } }` [download] That gives you the x and y components of the total "force" on each point. Then you can update the positions. `for $i (0 .. $npts-1) { $x[$i] += $dt * $fx[$i]; $y[$i] += $dt * $fy[$i]; }` [download] The trick is choosing a good time step `$dt`. Too small and it will converge very slowly. Too high and it will rattle around and never settle down. (You might want to look at the forces before choosing a time step, which is why I've put the update in a separate loop.) This kind of thing is sensitive to initial conditions and doesn't always converge well, so you may need to fiddle around with it to get it to work.	[reply] [d/l] [select]
Re^4: Data visualisation. by BrowserUk (Patriarch) on Jan 03, 2014 at 00:08 UTC
Re^5: Data visualisation. by Anonymous Monk on Jan 03, 2014 at 02:30 UTC
Some notes below your chosen depth have not been shown here
Re^5: Data visualisation. by roboticus (Chancellor) on Jan 03, 2014 at 00:57 UTC


Perl Monk, Perl Meditation
	PerlMonks

Data visualisation.

Algorithm derived from school geometry

updates

update