Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: Reading values from one .csv, searching for closest values in second .csv, returning results in third .csv?

by choroba (Cardinal)
on Feb 23, 2013 at 01:44 UTC ( [id://1020265]=note: print w/replies, xml ) Need Help??


in reply to Reading values from one .csv, searching for closest values in second .csv, returning results in third .csv?

No need for a hash, an array should be enough. Try the following:
#!/usr/bin/perl use warnings; use strict; open my $GENES, '<', 'genes' or die $!; open my $LOCATIONS, '<', 'locations' or die $!; chomp(my @locations = map { (split ' ')[1] } <$LOCATIONS>); # If IDs are not already sorted, uncomment the following line: # @locations = sort { $a <=> $b } @locations; for (<$GENES>) { my ($chromosome, $start, $end) = split ' '; print "$chromosome\t$start\t$end"; my $idx = 0; # For $end, start searching where you left for + $start. my $correction = 0; # Needed for Start(-) == Start and End(+) == E +nd. for my $pos ($start, $end) { $idx++ while $locations[$idx] <= $pos - $correction and $idx <= $#locations; die "No numbers around $pos ($idx)\n" if $idx == 0 or $idx > $#locations; print "\t$locations[$idx-1]\t$locations[$idx]"; $correction = 1; } print "\n"; }
Update: Fixed border cases.
لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
  • Comment on Re: Reading values from one .csv, searching for closest values in second .csv, returning results in third .csv?
  • Download Code

Replies are listed 'Best First'.
Re^2: Reading values from one .csv, searching for closest values in second .csv, returning results in third .csv?
by pickleswarlz (Initiate) on Feb 26, 2013 at 19:38 UTC

    Thank you very much! I am using the first suggestion, as I don't really understand the binary search yet, and so far it doesn't seem to have problems with searching the 500,000 rows.

    I am now running into the problem of the results being output on different lines in the resulting CSV file for each print command. Here is my code:

    #!/usr/bin/perl use warnings; use strict; open my $GENES, '<', 'chr1data.csv' or die $!; open my $LOCATIONS, '<', 'chr1snps.csv' or die $!; chomp(my @locations = map { (split ',')[2] } <$LOCATIONS>); # If IDs are not already sorted, uncomment the following line: # @locations = sort { $a <=> $b } @locations; for (<$GENES>) { my ($chromosome, $start, $end) = split ','; print "$chromosome,$start,$end"; my $idx = 0; # For $end, start searching where you left for + $start. my $correction = 0; # Needed for Start(-) == Start and End(+) == E +nd. for my $pos ($start, $end) { $idx++ while $locations[$idx] <= $pos - $correction and $idx <= $#locations; die "No numbers around $pos ($idx) \n" if $idx == 0 or $idx > $#locations; print ",$locations[$idx-1],$locations[$idx]"; $correction = 1; } print "\n"; }

    Printing print ",$locations[$idx-1],$locations[$idx]"; puts this information on a new line. I'd like it to come out on the same line as print "$chromosome,$start,$end"; for each search. Do I have a \n in the wrong place?

      Your $end probably contains newline (I split on ' ', which removed it, you split on a comma). Just
      chomp $end;
      before printing it.
      لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
        Brilliant, works perfectly! Thanks!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1020265]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (2)
As of 2025-04-18 03:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.