in reply to Re: Faulty Control Structures?
in thread Faulty Control Structures?

Thanks for your input. Unfortunately, this won't work as there isn't a good selection of unique identifiers to use as Keys for the Hash. So, using the code you provide, I'd end up with 23 key-value combinations, when I need 330k :). A hash of arrays would work better, in that I could have the values appended to the arrays for each chromosome, but then getting the data out would be a bit of a nightmare. I will look to cleaning up the globals though, as I was being a bit lazy there :).
EDIT: Actually, 24 combinations, as there are both x and y to consider :).

Replies are listed 'Best First'.
Re^3: Faulty Control Structures?
by Narveson (Chaplain) on Jan 29, 2008 at 04:55 UTC

    You're right. I overlooked the statement label in one of your next statements. I could not have arrived at my misreading if I had been as aware as you are that there are only two dozen chromosomes.

    But what about your hash of arrays? Why would getting the data out be such a nightmare?

    Populating the hash of arrays:

    open my $annotation_read_handle, '<', $annotation_file; my %annotations_for; while (my $ad = <$annotation_read_handle> ) { # read $an_chrom out of $ad my ($an_chrom, undef) = split(/\t/, $ad); # store for future lookups push @$annotations_for{$an_chrom}, $ad; } close $annotation_read_handle;

    Now read through the main data file and assign each chromosome number to my $main_chrom.

    # look up the list of annotations relevant to the current chromoso +me my $annotations_ref = $annotations_for{$main_chrom}; # loop through just these annotations ILC: foreach my $ad (@$annotations_ref) { # ... }

    Of course—as other more enlightened commentators have already pointed out—the most important thing to optimize is the range_find subroutine.