Re: Reading values from one .csv, searching for closest values in second .csv, returning results in third .csv?

No need for a hash, an array should be enough. Try the following:

#!/usr/bin/perl
use warnings;
use strict;

open my $GENES,     '<', 'genes'     or die $!;
open my $LOCATIONS, '<', 'locations' or die $!;

chomp(my @locations = map { (split ' ')[1] } <$LOCATIONS>);
# If IDs are not already sorted, uncomment the following line:
# @locations = sort { $a <=> $b } @locations;

for (<$GENES>) {
    my ($chromosome, $start, $end) = split ' ';
    print "$chromosome\t$start\t$end";
    my $idx = 0;        # For $end, start searching where you left for
+ $start.
    my $correction = 0; # Needed for Start(-) == Start and End(+) == E
+nd.
    for my $pos ($start, $end) {
        $idx++ while $locations[$idx] <= $pos - $correction
                     and $idx <= $#locations;
        die "No numbers around $pos ($idx)\n"
            if $idx == 0 or $idx > $#locations;
        print "\t$locations[$idx-1]\t$locations[$idx]";
        $correction = 1;
    }
    print "\n";
}
[download]

Update: Fixed border cases.

لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

Comment on Re: Reading values from one .csv, searching for closest values in second .csv, returning results in third .csv? Download Code

Replies are listed 'Best First'.
Re^2: Reading values from one .csv, searching for closest values in second .csv, returning results in third .csv? by pickleswarlz (Initiate) on Feb 26, 2013 at 19:38 UTC
Thank you very much! I am using the first suggestion, as I don't really understand the binary search yet, and so far it doesn't seem to have problems with searching the 500,000 rows. I am now running into the problem of the results being output on different lines in the resulting CSV file for each print command. Here is my code: #!/usr/bin/perl use warnings; use strict; open my $GENES, '<', 'chr1data.csv' or die $!; open my $LOCATIONS, '<', 'chr1snps.csv' or die $!; chomp(my @locations = map { (split ',')[2] } <$LOCATIONS>); # If IDs are not already sorted, uncomment the following line: # @locations = sort { $a <=> $b } @locations; for (<$GENES>) { my ($chromosome, $start, $end) = split ','; print "$chromosome,$start,$end"; my $idx = 0; # For $end, start searching where you left for + $start. my $correction = 0; # Needed for Start(-) == Start and End(+) == E +nd. for my $pos ($start, $end) { $idx++ while $locations[$idx] <= $pos - $correction and $idx <= $#locations; die "No numbers around $pos ($idx) \n" if $idx == 0 or $idx > $#locations; print ",$locations[$idx-1],$locations[$idx]"; $correction = 1; } print "\n"; } [download] Printing `print ",$locations[$idx-1],$locations[$idx]";` puts this information on a new line. I'd like it to come out on the same line as `print "$chromosome,$start,$end";` for each search. Do I have a `\n` in the wrong place?	[reply] [d/l] [select]
Re^3: Reading values from one .csv, searching for closest values in second .csv, returning results in third .csv? by choroba (Cardinal) on Feb 26, 2013 at 22:42 UTC
Your $end probably contains newline (I split on ' ', which removed it, you split on a comma). Just `chomp $end;` [download] before printing it. لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply] [d/l]
Re^4: Reading values from one .csv, searching for closest values in second .csv, returning results in third .csv? by pickleswarlz (Initiate) on Feb 27, 2013 at 20:53 UTC
Brilliant, works perfectly! Thanks!	[reply]


Perl-Sensitive Sunglasses
	PerlMonks