Thank you very much! I am using the first suggestion, as I don't really understand the binary search yet, and so far it doesn't seem to have problems with searching the 500,000 rows.
I am now running into the problem of the results being output on different lines in the resulting CSV file for each print command. Here is my code:
#!/usr/bin/perl
use warnings;
use strict;
open my $GENES, '<', 'chr1data.csv' or die $!;
open my $LOCATIONS, '<', 'chr1snps.csv' or die $!;
chomp(my @locations = map { (split ',')[2] } <$LOCATIONS>);
# If IDs are not already sorted, uncomment the following line:
# @locations = sort { $a <=> $b } @locations;
for (<$GENES>) {
my ($chromosome, $start, $end) = split ',';
print "$chromosome,$start,$end";
my $idx = 0; # For $end, start searching where you left for
+ $start.
my $correction = 0; # Needed for Start(-) == Start and End(+) == E
+nd.
for my $pos ($start, $end) {
$idx++ while $locations[$idx] <= $pos - $correction
and $idx <= $#locations;
die "No numbers around $pos ($idx) \n"
if $idx == 0 or $idx > $#locations;
print ",$locations[$idx-1],$locations[$idx]";
$correction = 1;
}
print "\n";
}
Printing print ",$locations[$idx-1],$locations[$idx]"; puts this information on a new line. I'd like it to come out on the same line as print "$chromosome,$start,$end"; for each search. Do I have a \n in the wrong place? |