Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

Update: sorry, but this doesn't work. It misses many overlapping sets.

If I understand your objectives correctly, something like the following untested code may help. It assumes the input data is sorted by the first field. The method of detecting overlaps is quite different from what you posted. I don't know if it gives the same results. It has the advantage that it does a single pass over each set of input records.

use strict; use warnings; open(my $N148,'<',"148Nsorted.bed") or die $!; open(my $N162,'<',"162Nsorted.bed") or die $!; open(my $N174,'<',"174Nsorted.bed") or die $!; open(my $N175,'<',"175Nsorted.bed") or die $!; my ($chr148, $start148, $end148) = scan($N148); my ($chr162, $start162, $end162) = scan($N162); my ($chr174, $start174, $end174) = scan($N174); my ($chr175, $start175, $end175) = scan($N175); my ($minstart, $maxstart, $minend, $maxend); while(defined($chr148)) { $minstart = $start148; $maxstart = $start148; $minend = $end148; $maxend = $end148; while(defined($chr162) and $chr162 lt $chr148) { ($chr162, $start162, $end162) = scan($N162); } while(defined($chr162) and $chr162 eq $chr148) { $minstart = $start162 if($start162 < $minstart); $maxstart = $start162 if($start162 > $maxstart); $minend = $end162 if($end162 < $minend); $maxend = $end162 if($end162 > $maxend); if($maxstart <= $minend) { while(defined($chr174) and $chr174 lt $chr148) { ($chr174, $start174, $end174) = scan($N174); } while(defined($chr174) and $chr174 eq $chr148) { $minstart = $start174 if($start174 < $minstart); $maxstart = $start174 if($start174 > $maxstart); $minend = $end174 if($end174 < $minend); $maxend = $end174 if($end174 > $maxend); if($maxstart <= $minend) { while(defined($chr175) and $chr175 lt $chr148) { ($chr175, $start175, $end175) = scan($N175); } while(defined($chr175) and $chr175 eq $chr148) { $minstart = $start175 if($start175 < $minstart +); $maxstart = $start175 if($start175 > $maxstart +); $minend = $end175 if($end175 < $minend); $maxend = $end175 if($end175 > $maxend); if($maxstart <= $minend) { print "$chr148 $minstart $maxend\n"; } ($chr175, $start175, $end175) = scan($N175); } } ($chr174, $start174, $end174) = scan($N174); } } ($chr162, $start162, $end162) = scan($N162); } ($chr148, $start148, $end148) = scan($N148); } close($N148); close($N162); close($N174); close($N175); exit(0); sub scan { my ($fh) = @_; my $line = <$fh>; return() unless($line); return(split(/\t/, $line)); }

update: the following might do what you want. It will take more memory and time than the above but it will find all overalpping sets.

use strict; use warnings; use List::Util qw(min max); my $ranges148 = scan('148Nsorted.bed'); my $ranges162 = scan('162Nsorted.bed'); my $ranges174 = scan('174Nsorted.bed'); my $ranges175 = scan('175Nsorted.bed'); foreach my $chr (keys %$ranges148) { foreach my $range148 (@{$ranges148->{$chr}}) { foreach my $range162 (@{$ranges162->{$chr}}) { foreach my $range174 (@{$ranges174->{$chr}}) { foreach my $range175 (@{$ranges175->{$chr}}) { my $minstart = min( $range148->[0], $range162->[0], $range174->[0], $range175->[0]); my $maxstart = max( $range148->[0], $range162->[0], $range174->[0], $range175->[0]); my $minend = min( $range148->[1], $range162->[1], $range174->[1], $range175->[1]); my $maxend = max( $range148->[1], $range162->[1], $range174->[1], $range175->[1]); if($maxstart <= $minend) { print "overlapping $chr: $minstart, $maxend - +($range148->[0], $range148->[1]), ($range162->[0], $range162->[1]), ( +$range174->[0], $range174->[1]), ($range175->[0], $range175->[1])\n"; } else { print "not overlapping: $chr: $minstart, $maxe +nd - ($range148->[0], $range148->[1]), ($range162->[0], $range162->[1 +]), ($range174->[0], $range174->[1]), ($range175->[0], $range175->[1] +)\n"; } } } } } } exit(0); sub scan { my ($file) = @_; open(my $fh, '<', $file) or die "$file: $!"; my $ranges; foreach my $line (<$fh>) { chomp($line); my ($chr, $start, $end) = split(/\t/, $line); push(@{$ranges->{$chr}}, [ 0+$start, 0+$end ]); } return($ranges); }

In reply to Re: Find overlap by ig
in thread Find overlap by linseyr

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others musing on the Monastery: (14)
    As of 2014-10-24 09:54 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      For retirement, I am banking on:










      Results (131 votes), past polls