Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re: Re: Comparing 2-D co-ordinates

by Anonymous Monk
on Jul 31, 2003 at 15:20 UTC ( [id://279608]=note: print w/replies, xml ) Need Help??


in reply to Re: Comparing 2-D co-ordinates
in thread Comparing 2-D co-ordinates

Im not sure I've understood your full specs. Whenever two matches overlap, do you want to discard the shorter? Even in this case?:
<----> <-----> <---->
Here the two shorter matches together give more coverage than the longest. Another structure to represent this data is a list in which each range occurs twice, a lexically sorted list of pairs, (start,end) and (end,start). But before choosing your data structure, you have to know what question you wan to answer. Chris Morris

Replies are listed 'Best First'.
Re: Re: Re: Comparing 2-D co-ordinates
by aging acolyte (Pilgrim) on Jul 31, 2003 at 15:35 UTC
    Chris,

    Thanks for helping clarify my problem. I want to find the set of discrete hits that gives me the best coverage (if that make sense?).

    In your example above (ah the beauty of acsii graphics) I would like to pick the two shorter examples as combined they give me a higher overall coverage.

    I apologise for any confusion that I am causing - I am having a bad day - too hot and sticky in here!

    A.A.

      Not as neat as my first attempt as this requires a sort, but then, maybe this one actually does what you want:)

      #! perl -slw use strict; my @data = map{ [ split "','", substr( $_, 1, -2 ) ] } <DATA>; my $mask = ' ' x 10_000; my $max = 0; # Sort the data to put longest ranges first (could be more efficient:) @data = sort{ $b->[2] - $b->[1] <=> $a->[2] - $a->[1] } @data; for( @data ) { # Calc range my $len = $_->[2] - $_->[1] +1; # Test to see if this range is already covered? if( substr( $mask, $_->[1], $len ) !~ m[ ] ) { $_ = undef; # if so, delete next; } # Other wise add it to the mask substr( $mask, $_->[1], $len ) = 'x' x $len; # Remeber the highest pos. $max = $_->[2] if $_->[2] > $max; } # Trim the string $mask = substr( $mask, 0, $max ); # Count coverage my $coverage = $mask =~ tr[x][x]; # Calc. percentage printf "cover = %.1f%% \n", $coverage / length($mask) *100; print 'Using:'; # Display contributing ranges having discarded deleted elements. print "@$_" for grep{ $_ } @data; __DATA__ 'NM_176827','618','692' 'NM_176827','621','710' 'NM_176827','622','692' 'NM_176827','629','710'

      Output

      P:\test>279587-3 cover = 13.0% Using: NM_176827 621 710 NM_176827 618 692

      I've also got a version that does this using vec instead of substr, which saves 7/8 of the space, but runs much more slowly.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller

      You want the to take the smallest number of your original segments which, combined, cover as much of the range as possible, with no overlaps. This sounds like a solved problem. ie; check some algorythm books/websites

      --
      I'm not belgian but I play one on TV.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://279608]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (5)
As of 2024-04-23 23:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found