Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Iterating through file to find specific subsets of lines

by thegirlm0nkey (Initiate)
on Dec 04, 2013 at 15:04 UTC ( #1065602=perlquestion: print w/ replies, xml ) Need Help??
thegirlm0nkey has asked for the wisdom of the Perl Monks concerning the following question:

Hi - getting extremely stuck and would love some insight! I have a file which contains sequential numbers (actually genomic co-ordinates, I'm an amateur bioinformatician!) and an associated score. I need to extract regions where the score dips below a certain level. The file looks something like this:
1 50 2 50 3 1 4 10 5 49 6 8 7 50 8 5 9 5 10 40
So in this example - the first number on each line is the co-ordinate, and the second is the score. I need all the regions scoring less than 50, so for the small example above, I would get something like:
3 6 8 10
Hope that makes sense - I'm basically looking for the first and last positions where the score is less than 50. So far I have slurped the file into an array like this:
foreach my $line (@lines) { chomp $line; my @columns = split(/\t/, $line); my $score = $columns[1]; if ($score < 50) { #something here... } }
But I'm stuck with the 'something here' - I need to keep track of the first time a score of less than 50 is seen, and the last time it is seen before it goes above 50, and capture the two corresponding $columns[0] numbers. Really hope I've explained this properly! TIA.

Comment on Iterating through file to find specific subsets of lines
Select or Download Code
Re: Iterating through file to find specific subsets of lines
by toolic (Chancellor) on Dec 04, 2013 at 15:19 UTC
    Keep track of the positions in an array:
    use warnings; use strict; my @lines = <DATA>; my @pos; foreach my $line (@lines) { chomp $line; my @columns = split( /\s+/, $line ); my $score = $columns[1]; if ($score < 50) { push @pos, $columns[0]; } else { print "@pos[0, -1]\n" if @pos; @pos = (); } } print "@pos[0, -1]\n" if @pos; __DATA__ 1 50 2 50 3 1 4 10 5 49 6 8 7 50 8 5 9 5 10 40

    Note: I changed \t to \s+ just to create a self-contained example.

      Thank you! Exactly what I needed!
Re: Iterating through file to find specific subsets of lines
by choroba (Abbot) on Dec 04, 2013 at 15:20 UTC
    What ouput do you expect for the following input?
    1 1 2 49 3 1

    Sorry, did not understand the question.

    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Iterating through file to find specific subsets of lines
by jethro (Monsignor) on Dec 04, 2013 at 15:23 UTC
    my $dipped=0; my $column; foreach ... ... if ($score < 50) { $dipped= $columns[0] if (not $dipped); } else { print "$dipped $column\n" if ($dipped); $dipped=0; } $column= $column[0]; } print "$dipped $column\n" if ($dipped);

    Untested. $dipped is the variable that stores the first score that dips under 50 in a "dip region" and it also signifies that you are in such a region by being not 0. The construct with $column is necessary to get the number on the last line out of the foreach loop if a dipped region lasts until the end, could be avoided by declaring @columns outside the loop.

    UPDATE: Removed the off-by-one error found by toolic, using $column to store $column[0] of the previous loop step

      Using your scalar would be more efficient than my array solution... if you could get rid of your off-by-1 error (tested).

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1065602]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (9)
As of 2014-12-20 18:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (97 votes), past polls