in reply to Sorting Data By Overlapping Intervals
And, if the latter data snippet is "realistic", then your goal is to produce nine distinct text files, with lots of overlapping content across those nine files - for example, if a given line from the CG input has a value of 999999 in column 4, it will be included in all nine outputs, right? (Because "999999" falls within the range for all nine intervals in that data snippet.)
If I have that right, then I think you'll be better off if you read the interval data first, create a hash containing file handles for the intervals along with their min and max values. Then read the CG data; as you look at each CG record, loop over the hash of intervals and print to each of the file handles where it belongs. Something like:
(not tested)# set up your path strings for the two input files... then: my %intervals; open( INTERVALS, $interval_path ) or die "$interval_path: $!\n"; while (<INTERVALS>) { chomp; my ( $str, $min, $max ) = split; next unless ( $min =~ /^\d+$/ and $max =~ /^\d+$/ ); my $out_path = "...."; # whatever makes a good name for this outp +ut... open( my $ofh, '>', $out_path ) or die "$out_path: $!\n"; $intervals{$out_path} = { 'min' => $min, 'max' => $max, 'fh' => $o +fh }; } open( CG, $cg_path ) or die "$cg_path: $!\n"; while (<CG>) { my $keyval = ( split )[3]; for my $output ( keys %intervals ) { if ( $keyval >= $intervals{$output}{'min'} and $keyval <= $intervals{$output}{'max'} ) { print { $intervals{$output}{'fh'} } $_; } } }
UPDATE: If your "intervals" list is really a lot longer than the 9-line snippet that you showed us, you might not be able to have that many output file handles open at once. In that case, move the open statement out of the first while loop (don't store an 'fh' element in the hash), and put it just before the print statement of the second loop (and change to append mode) - i.e.:
By using a lexical scalar variable for the file handle, it will be closed automatically at each iteration, which is what you would want in this case.... while (<CG>) { my $keyval = ( split )[3]; for my $output ( keys %intervals ) { if ( $keyval >= $intervals{$output}{'min'} and $keyval <= $intervals{$output}{'max'} ) { open( my $ofh, '>>', $output ) or die "$out_path: $!\n"; print $ofh $_; } } }