Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: Reading values from one .csv, searching for closest values in second .csv, returning results in third .csv?

by choroba (Canon)
on Feb 23, 2013 at 01:44 UTC ( #1020265=note: print w/ replies, xml ) Need Help??


in reply to Reading values from one .csv, searching for closest values in second .csv, returning results in third .csv?

No need for a hash, an array should be enough. Try the following:

#!/usr/bin/perl use warnings; use strict; open my $GENES, '<', 'genes' or die $!; open my $LOCATIONS, '<', 'locations' or die $!; chomp(my @locations = map { (split ' ')[1] } <$LOCATIONS>); # If IDs are not already sorted, uncomment the following line: # @locations = sort { $a <=> $b } @locations; for (<$GENES>) { my ($chromosome, $start, $end) = split ' '; print "$chromosome\t$start\t$end"; my $idx = 0; # For $end, start searching where you left for + $start. my $correction = 0; # Needed for Start(-) == Start and End(+) == E +nd. for my $pos ($start, $end) { $idx++ while $locations[$idx] <= $pos - $correction and $idx <= $#locations; die "No numbers around $pos ($idx)\n" if $idx == 0 or $idx > $#locations; print "\t$locations[$idx-1]\t$locations[$idx]"; $correction = 1; } print "\n"; }
Update: Fixed border cases.
لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ


Comment on Re: Reading values from one .csv, searching for closest values in second .csv, returning results in third .csv?
Download Code
Replies are listed 'Best First'.
Re^2: Reading values from one .csv, searching for closest values in second .csv, returning results in third .csv?
by pickleswarlz (Initiate) on Feb 26, 2013 at 19:38 UTC

    Thank you very much! I am using the first suggestion, as I don't really understand the binary search yet, and so far it doesn't seem to have problems with searching the 500,000 rows.

    I am now running into the problem of the results being output on different lines in the resulting CSV file for each print command. Here is my code:

    #!/usr/bin/perl use warnings; use strict; open my $GENES, '<', 'chr1data.csv' or die $!; open my $LOCATIONS, '<', 'chr1snps.csv' or die $!; chomp(my @locations = map { (split ',')[2] } <$LOCATIONS>); # If IDs are not already sorted, uncomment the following line: # @locations = sort { $a <=> $b } @locations; for (<$GENES>) { my ($chromosome, $start, $end) = split ','; print "$chromosome,$start,$end"; my $idx = 0; # For $end, start searching where you left for + $start. my $correction = 0; # Needed for Start(-) == Start and End(+) == E +nd. for my $pos ($start, $end) { $idx++ while $locations[$idx] <= $pos - $correction and $idx <= $#locations; die "No numbers around $pos ($idx) \n" if $idx == 0 or $idx > $#locations; print ",$locations[$idx-1],$locations[$idx]"; $correction = 1; } print "\n"; }

    Printing print ",$locations[$idx-1],$locations[$idx]"; puts this information on a new line. I'd like it to come out on the same line as print "$chromosome,$start,$end"; for each search. Do I have a \n in the wrong place?

      Your $end probably contains newline (I split on ' ', which removed it, you split on a comma). Just
      chomp $end;
      before printing it.
      لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
        Brilliant, works perfectly! Thanks!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1020265]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (5)
As of 2015-07-30 01:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (269 votes), past polls