http://www.perlmonks.org?node_id=998945


in reply to Find overlap

Here is one way to do it:

#!/usr/bin/perl use warnings; use strict; @ARGV = ( '148Nsorted.bed', '162Nsorted.bed', '174Nsorted.bed', '175Ns +orted.bed' ); my %data; while ( <> ) { /^\s*(\S+)\s+(\d+)\s+(\d+)\s*$/ or next; $data{ $1 } |= '0' x ( $2 - 1 ) . '1' x ( $3 - ( $2 - 1 ) ); } keys( %data ) == 1 or die "Error: too many keys.\n"; my ( $name, $string ) = each %data; $string =~ /10+1/ and die "Error: no overlap.\n"; $string =~ /^0*1/ and my $start = $+[ 0 ]; $string =~ /.*1/ and my $end = $+[ 0 ]; print "$name\t$start\t$end\n";

Replies are listed 'Best First'.
Re^2: Find overlap
by Anonymous Monk on Oct 14, 2012 at 16:32 UTC
    Hi, Thanks but this doesn't work for these files. I get an error "Too many keys"
Re^2: Find overlap
by linseyr (Acolyte) on Oct 14, 2012 at 16:46 UTC
    Sorry but im pretty new with perl so I dont really know what this code does. Could you give me some explanation? And why do I get the error: Too many keys when I try to run it? Thanks.

      The code uses the bit-wise OR operator (|) to turn all bytes in the string in the range to the character '1'.

      And why do I get the error: Too many keys when I try to run it?

      The keys of %data represent the first column of the data files ('chr1') so if you get that error message it means that there was something other than 'chr1' in one of the files.

      After thinking about the problem, and rereading it, it seems that ALL files must overlap so this may work better:

      #!/usr/bin/perl use warnings; use strict; @ARGV = ( '148Nsorted.bed', '162Nsorted.bed', '174Nsorted.bed', '175Ns +orted.bed' ); my ( $bit_mask, %data ) = 1; while ( <> ) { /^\s*(\S+)\s+(\d+)\s+(\d+)\s*$/ or next; $data |= chr( 0 ) x ( $2 - 1 ) . chr( $bit_mask ) x ( $3 - ( $2 - +1 ) ); $bit_mask <<= 1; } keys( %data ) == 1 or die "Error: too many keys: @{[ keys %data ]}\n"; my ( $name, $string ) = each %data; $string =~ /\x0f/ or die "Error: no overlap on all files."; $string =~ /[^\0]/ and my ( $start, $end ) = ( $+[ 0 ], length $string + ); print "$name\t$start\t$end\n";