Iterating through file to find specific subsets of lines

thegirlm0nkey has asked for the wisdom of the Perl Monks concerning the following question:

Hi - getting extremely stuck and would love some insight! I have a file which contains sequential numbers (actually genomic co-ordinates, I'm an amateur bioinformatician!) and an associated score. I need to extract regions where the score dips below a certain level. The file looks something like this:

So in this example - the first number on each line is the co-ordinate, and the second is the score. I need all the regions scoring less than 50, so for the small example above, I would get something like:

3   6
8   10
[download]

Hope that makes sense - I'm basically looking for the first and last positions where the score is less than 50. So far I have slurped the file into an array like this:

foreach my $line (@lines) {
 chomp $line;
 my @columns = split(/\t/, $line);
 my $score = $columns[1];
   if ($score < 50) {
    #something here...    
   }
}
[download]

But I'm stuck with the 'something here' - I need to keep track of the first time a score of less than 50 is seen, and the last time it is seen before it goes above 50, and capture the two corresponding $columns[0] numbers. Really hope I've explained this properly! TIA.

Comment on Iterating through file to find specific subsets of lines Select or Download Code

Replies are listed 'Best First'.
Re: Iterating through file to find specific subsets of lines by toolic (Bishop) on Dec 04, 2013 at 15:19 UTC
Keep track of the positions in an array: `use warnings; use strict; my @lines = <DATA>; my @pos; foreach my $line (@lines) { chomp $line; my @columns = split( /\s+/, $line ); my $score = $columns[1]; if ($score < 50) { push @pos, $columns[0]; } else { print "@pos[0, -1]\n" if @pos; @pos = (); } } print "@pos[0, -1]\n" if @pos; __DATA__ 1 50 2 50 3 1 4 10 5 49 6 8 7 50 8 5 9 5 10 40` [download] Note: I changed \t to \s+ just to create a self-contained example.	[reply] [d/l]
Re^2: Iterating through file to find specific subsets of lines by thegirlm0nkey (Initiate) on Dec 04, 2013 at 15:35 UTC
Thank you! Exactly what I needed!	[reply]
Re: Iterating through file to find specific subsets of lines by jethro (Monsignor) on Dec 04, 2013 at 15:23 UTC
`my $dipped=0; my $column; foreach ... ... if ($score < 50) { $dipped= $columns[0] if (not $dipped); } else { print "$dipped $column\n" if ($dipped); $dipped=0; } $column= $column[0]; } print "$dipped $column\n" if ($dipped);` [download] Untested. $dipped is the variable that stores the first score that dips under 50 in a "dip region" and it also signifies that you are in such a region by being not 0. The construct with $column is necessary to get the number on the last line out of the foreach loop if a dipped region lasts until the end, could be avoided by declaring @columns outside the loop. UPDATE: Removed the off-by-one error found by toolic, using $column to store `$column[0]` of the previous loop step	[reply] [d/l] [select]
Re^2: Iterating through file to find specific subsets of lines by toolic (Bishop) on Dec 04, 2013 at 15:37 UTC
Using your scalar would be more efficient than my array solution... if you could get rid of your off-by-1 error (tested).	[reply]
Re: Iterating through file to find specific subsets of lines by choroba (Cardinal) on Dec 04, 2013 at 15:20 UTC
What ouput do you expect for the following input? `1 1 2 49 3 1` [download] Sorry, did not understand the question. لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply] [d/l]


Perl Monk, Perl Meditation
	PerlMonks