You could turn the problem inside out: load the test values into memory then scan the large reference file one line at a time to perform the matching:
#!/usr/bin/perl
use strict;
my $reps = <<REPS;
chr1 100 120 feature1
chr1 200 250 feature2
chr2 150 200 feature1
chr2 280 350 feature1
chr3 100 150 feature2
chr3 300 450 feature2
REPS
my %tests;
while (my $line = <DATA>) {
$line =~ s/[\n\r]//g;
my @array = split /\s+/, $line;
$tests{$array[0]}{$array[1]}{'end'} = $array[2];
$tests{$array[0]}{$array[1]}{'rep'} = $array[3];
}
open my $repIn, '<', \$reps;
while (<$repIn>) {
my ($chr, $start, $end, $rep) = split ' ';
next if !exists $tests{$chr};
for my $s (keys %{$tests{$chr}}) {
if ($start <= $tests{$chr}{$s}{'end'}) {
last if $s >= $end;
print "$chr $start $end $rep\n";
}
}
}
__DATA__
chr2 160 210
True laziness is hard work
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|