Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Iterating through file to find specific subsets of lines

by thegirlm0nkey (Initiate)
on Dec 04, 2013 at 15:04 UTC ( #1065602=perlquestion: print w/ replies, xml ) Need Help??
thegirlm0nkey has asked for the wisdom of the Perl Monks concerning the following question:

Hi - getting extremely stuck and would love some insight! I have a file which contains sequential numbers (actually genomic co-ordinates, I'm an amateur bioinformatician!) and an associated score. I need to extract regions where the score dips below a certain level. The file looks something like this:
1 50 2 50 3 1 4 10 5 49 6 8 7 50 8 5 9 5 10 40
So in this example - the first number on each line is the co-ordinate, and the second is the score. I need all the regions scoring less than 50, so for the small example above, I would get something like:
3 6 8 10
Hope that makes sense - I'm basically looking for the first and last positions where the score is less than 50. So far I have slurped the file into an array like this:
foreach my $line (@lines) { chomp $line; my @columns = split(/\t/, $line); my $score = $columns[1]; if ($score < 50) { #something here... } }
But I'm stuck with the 'something here' - I need to keep track of the first time a score of less than 50 is seen, and the last time it is seen before it goes above 50, and capture the two corresponding $columns[0] numbers. Really hope I've explained this properly! TIA.

Comment on Iterating through file to find specific subsets of lines
Select or Download Code
Re: Iterating through file to find specific subsets of lines
by toolic (Chancellor) on Dec 04, 2013 at 15:19 UTC
    Keep track of the positions in an array:
    use warnings; use strict; my @lines = <DATA>; my @pos; foreach my $line (@lines) { chomp $line; my @columns = split( /\s+/, $line ); my $score = $columns[1]; if ($score < 50) { push @pos, $columns[0]; } else { print "@pos[0, -1]\n" if @pos; @pos = (); } } print "@pos[0, -1]\n" if @pos; __DATA__ 1 50 2 50 3 1 4 10 5 49 6 8 7 50 8 5 9 5 10 40

    Note: I changed \t to \s+ just to create a self-contained example.

      Thank you! Exactly what I needed!
Re: Iterating through file to find specific subsets of lines
by choroba (Abbot) on Dec 04, 2013 at 15:20 UTC
    What ouput do you expect for the following input?
    1 1 2 49 3 1

    Sorry, did not understand the question.

    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Iterating through file to find specific subsets of lines
by jethro (Monsignor) on Dec 04, 2013 at 15:23 UTC
    my $dipped=0; my $column; foreach ... ... if ($score < 50) { $dipped= $columns[0] if (not $dipped); } else { print "$dipped $column\n" if ($dipped); $dipped=0; } $column= $column[0]; } print "$dipped $column\n" if ($dipped);

    Untested. $dipped is the variable that stores the first score that dips under 50 in a "dip region" and it also signifies that you are in such a region by being not 0. The construct with $column is necessary to get the number on the last line out of the foreach loop if a dipped region lasts until the end, could be avoided by declaring @columns outside the loop.

    UPDATE: Removed the off-by-one error found by toolic, using $column to store $column[0] of the previous loop step

      Using your scalar would be more efficient than my array solution... if you could get rid of your off-by-1 error (tested).

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1065602]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (4)
As of 2014-07-31 02:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (244 votes), past polls