Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

Comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

Update: sorry, but this doesn't work. It misses many overlapping sets.

If I understand your objectives correctly, something like the following untested code may help. It assumes the input data is sorted by the first field. The method of detecting overlaps is quite different from what you posted. I don't know if it gives the same results. It has the advantage that it does a single pass over each set of input records.

use strict; use warnings; open(my $N148,'<',"148Nsorted.bed") or die $!; open(my $N162,'<',"162Nsorted.bed") or die $!; open(my $N174,'<',"174Nsorted.bed") or die $!; open(my $N175,'<',"175Nsorted.bed") or die $!; my ($chr148, $start148, $end148) = scan($N148); my ($chr162, $start162, $end162) = scan($N162); my ($chr174, $start174, $end174) = scan($N174); my ($chr175, $start175, $end175) = scan($N175); my ($minstart, $maxstart, $minend, $maxend); while(defined($chr148)) { $minstart = $start148; $maxstart = $start148; $minend = $end148; $maxend = $end148; while(defined($chr162) and $chr162 lt $chr148) { ($chr162, $start162, $end162) = scan($N162); } while(defined($chr162) and $chr162 eq $chr148) { $minstart = $start162 if($start162 < $minstart); $maxstart = $start162 if($start162 > $maxstart); $minend = $end162 if($end162 < $minend); $maxend = $end162 if($end162 > $maxend); if($maxstart <= $minend) { while(defined($chr174) and $chr174 lt $chr148) { ($chr174, $start174, $end174) = scan($N174); } while(defined($chr174) and $chr174 eq $chr148) { $minstart = $start174 if($start174 < $minstart); $maxstart = $start174 if($start174 > $maxstart); $minend = $end174 if($end174 < $minend); $maxend = $end174 if($end174 > $maxend); if($maxstart <= $minend) { while(defined($chr175) and $chr175 lt $chr148) { ($chr175, $start175, $end175) = scan($N175); } while(defined($chr175) and $chr175 eq $chr148) { $minstart = $start175 if($start175 < $minstart +); $maxstart = $start175 if($start175 > $maxstart +); $minend = $end175 if($end175 < $minend); $maxend = $end175 if($end175 > $maxend); if($maxstart <= $minend) { print "$chr148 $minstart $maxend\n"; } ($chr175, $start175, $end175) = scan($N175); } } ($chr174, $start174, $end174) = scan($N174); } } ($chr162, $start162, $end162) = scan($N162); } ($chr148, $start148, $end148) = scan($N148); } close($N148); close($N162); close($N174); close($N175); exit(0); sub scan { my ($fh) = @_; my $line = <$fh>; return() unless($line); return(split(/\t/, $line)); }

update: the following might do what you want. It will take more memory and time than the above but it will find all overalpping sets.

use strict; use warnings; use List::Util qw(min max); my $ranges148 = scan('148Nsorted.bed'); my $ranges162 = scan('162Nsorted.bed'); my $ranges174 = scan('174Nsorted.bed'); my $ranges175 = scan('175Nsorted.bed'); foreach my $chr (keys %$ranges148) { foreach my $range148 (@{$ranges148->{$chr}}) { foreach my $range162 (@{$ranges162->{$chr}}) { foreach my $range174 (@{$ranges174->{$chr}}) { foreach my $range175 (@{$ranges175->{$chr}}) { my $minstart = min( $range148->[0], $range162->[0], $range174->[0], $range175->[0]); my $maxstart = max( $range148->[0], $range162->[0], $range174->[0], $range175->[0]); my $minend = min( $range148->[1], $range162->[1], $range174->[1], $range175->[1]); my $maxend = max( $range148->[1], $range162->[1], $range174->[1], $range175->[1]); if($maxstart <= $minend) { print "overlapping $chr: $minstart, $maxend - +($range148->[0], $range148->[1]), ($range162->[0], $range162->[1]), ( +$range174->[0], $range174->[1]), ($range175->[0], $range175->[1])\n"; } else { print "not overlapping: $chr: $minstart, $maxe +nd - ($range148->[0], $range148->[1]), ($range162->[0], $range162->[1 +]), ($range174->[0], $range174->[1]), ($range175->[0], $range175->[1] +)\n"; } } } } } } exit(0); sub scan { my ($file) = @_; open(my $fh, '<', $file) or die "$file: $!"; my $ranges; foreach my $line (<$fh>) { chomp($line); my ($chr, $start, $end) = split(/\t/, $line); push(@{$ranges->{$chr}}, [ 0+$start, 0+$end ]); } return($ranges); }

In reply to Re: Find overlap by ig
in thread Find overlap by linseyr

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and all is quiet...

    How do I use this? | Other CB clients
    Other Users?
    Others perusing the Monastery: (1)
    As of 2018-05-27 23:54 GMT
    Find Nodes?
      Voting Booth?