Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: Reading values from one .csv, searching for closest values in second .csv, returning results in third .csv?

by choroba (Abbot)
on Feb 23, 2013 at 01:44 UTC ( #1020265=note: print w/ replies, xml ) Need Help??


in reply to Reading values from one .csv, searching for closest values in second .csv, returning results in third .csv?

No need for a hash, an array should be enough. Try the following:

#!/usr/bin/perl use warnings; use strict; open my $GENES, '<', 'genes' or die $!; open my $LOCATIONS, '<', 'locations' or die $!; chomp(my @locations = map { (split ' ')[1] } <$LOCATIONS>); # If IDs are not already sorted, uncomment the following line: # @locations = sort { $a <=> $b } @locations; for (<$GENES>) { my ($chromosome, $start, $end) = split ' '; print "$chromosome\t$start\t$end"; my $idx = 0; # For $end, start searching where you left for + $start. my $correction = 0; # Needed for Start(-) == Start and End(+) == E +nd. for my $pos ($start, $end) { $idx++ while $locations[$idx] <= $pos - $correction and $idx <= $#locations; die "No numbers around $pos ($idx)\n" if $idx == 0 or $idx > $#locations; print "\t$locations[$idx-1]\t$locations[$idx]"; $correction = 1; } print "\n"; }
Update: Fixed border cases.
لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ


Comment on Re: Reading values from one .csv, searching for closest values in second .csv, returning results in third .csv?
Download Code
Re^2: Reading values from one .csv, searching for closest values in second .csv, returning results in third .csv?
by pickleswarlz (Initiate) on Feb 26, 2013 at 19:38 UTC

    Thank you very much! I am using the first suggestion, as I don't really understand the binary search yet, and so far it doesn't seem to have problems with searching the 500,000 rows.

    I am now running into the problem of the results being output on different lines in the resulting CSV file for each print command. Here is my code:

    #!/usr/bin/perl use warnings; use strict; open my $GENES, '<', 'chr1data.csv' or die $!; open my $LOCATIONS, '<', 'chr1snps.csv' or die $!; chomp(my @locations = map { (split ',')[2] } <$LOCATIONS>); # If IDs are not already sorted, uncomment the following line: # @locations = sort { $a <=> $b } @locations; for (<$GENES>) { my ($chromosome, $start, $end) = split ','; print "$chromosome,$start,$end"; my $idx = 0; # For $end, start searching where you left for + $start. my $correction = 0; # Needed for Start(-) == Start and End(+) == E +nd. for my $pos ($start, $end) { $idx++ while $locations[$idx] <= $pos - $correction and $idx <= $#locations; die "No numbers around $pos ($idx) \n" if $idx == 0 or $idx > $#locations; print ",$locations[$idx-1],$locations[$idx]"; $correction = 1; } print "\n"; }

    Printing print ",$locations[$idx-1],$locations[$idx]"; puts this information on a new line. I'd like it to come out on the same line as print "$chromosome,$start,$end"; for each search. Do I have a \n in the wrong place?

      Your $end probably contains newline (I split on ' ', which removed it, you split on a comma). Just
      chomp $end;
      before printing it.
      لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
        Brilliant, works perfectly! Thanks!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1020265]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (12)
As of 2014-09-30 21:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (384 votes), past polls