in reply to Find overlap
Update: sorry, but this doesn't work. It misses many overlapping sets.
If I understand your objectives correctly, something like the following untested code may help. It assumes the input data is sorted by the first field. The method of detecting overlaps is quite different from what you posted. I don't know if it gives the same results. It has the advantage that it does a single pass over each set of input records.
use strict; use warnings; open(my $N148,'<',"148Nsorted.bed") or die $!; open(my $N162,'<',"162Nsorted.bed") or die $!; open(my $N174,'<',"174Nsorted.bed") or die $!; open(my $N175,'<',"175Nsorted.bed") or die $!; my ($chr148, $start148, $end148) = scan($N148); my ($chr162, $start162, $end162) = scan($N162); my ($chr174, $start174, $end174) = scan($N174); my ($chr175, $start175, $end175) = scan($N175); my ($minstart, $maxstart, $minend, $maxend); while(defined($chr148)) { $minstart = $start148; $maxstart = $start148; $minend = $end148; $maxend = $end148; while(defined($chr162) and $chr162 lt $chr148) { ($chr162, $start162, $end162) = scan($N162); } while(defined($chr162) and $chr162 eq $chr148) { $minstart = $start162 if($start162 < $minstart); $maxstart = $start162 if($start162 > $maxstart); $minend = $end162 if($end162 < $minend); $maxend = $end162 if($end162 > $maxend); if($maxstart <= $minend) { while(defined($chr174) and $chr174 lt $chr148) { ($chr174, $start174, $end174) = scan($N174); } while(defined($chr174) and $chr174 eq $chr148) { $minstart = $start174 if($start174 < $minstart); $maxstart = $start174 if($start174 > $maxstart); $minend = $end174 if($end174 < $minend); $maxend = $end174 if($end174 > $maxend); if($maxstart <= $minend) { while(defined($chr175) and $chr175 lt $chr148) { ($chr175, $start175, $end175) = scan($N175); } while(defined($chr175) and $chr175 eq $chr148) { $minstart = $start175 if($start175 < $minstart +); $maxstart = $start175 if($start175 > $maxstart +); $minend = $end175 if($end175 < $minend); $maxend = $end175 if($end175 > $maxend); if($maxstart <= $minend) { print "$chr148 $minstart $maxend\n"; } ($chr175, $start175, $end175) = scan($N175); } } ($chr174, $start174, $end174) = scan($N174); } } ($chr162, $start162, $end162) = scan($N162); } ($chr148, $start148, $end148) = scan($N148); } close($N148); close($N162); close($N174); close($N175); exit(0); sub scan { my ($fh) = @_; my $line = <$fh>; return() unless($line); return(split(/\t/, $line)); }
update: the following might do what you want. It will take more memory and time than the above but it will find all overalpping sets.
use strict; use warnings; use List::Util qw(min max); my $ranges148 = scan('148Nsorted.bed'); my $ranges162 = scan('162Nsorted.bed'); my $ranges174 = scan('174Nsorted.bed'); my $ranges175 = scan('175Nsorted.bed'); foreach my $chr (keys %$ranges148) { foreach my $range148 (@{$ranges148->{$chr}}) { foreach my $range162 (@{$ranges162->{$chr}}) { foreach my $range174 (@{$ranges174->{$chr}}) { foreach my $range175 (@{$ranges175->{$chr}}) { my $minstart = min( $range148->[0], $range162->[0], $range174->[0], $range175->[0]); my $maxstart = max( $range148->[0], $range162->[0], $range174->[0], $range175->[0]); my $minend = min( $range148->[1], $range162->[1], $range174->[1], $range175->[1]); my $maxend = max( $range148->[1], $range162->[1], $range174->[1], $range175->[1]); if($maxstart <= $minend) { print "overlapping $chr: $minstart, $maxend - +($range148->[0], $range148->[1]), ($range162->[0], $range162->[1]), ( +$range174->[0], $range174->[1]), ($range175->[0], $range175->[1])\n"; } else { print "not overlapping: $chr: $minstart, $maxe +nd - ($range148->[0], $range148->[1]), ($range162->[0], $range162->[1 +]), ($range174->[0], $range174->[1]), ($range175->[0], $range175->[1] +)\n"; } } } } } } exit(0); sub scan { my ($file) = @_; open(my $fh, '<', $file) or die "$file: $!"; my $ranges; foreach my $line (<$fh>) { chomp($line); my ($chr, $start, $end) = split(/\t/, $line); push(@{$ranges->{$chr}}, [ 0+$start, 0+$end ]); } return($ranges); }
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: Find overlap
by linseyr (Acolyte) on Oct 14, 2012 at 20:21 UTC | |
by linseyr (Acolyte) on Oct 14, 2012 at 20:25 UTC |
In Section
Seekers of Perl Wisdom