Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

Iterating through file to find specific subsets of lines

by thegirlm0nkey (Initiate)
on Dec 04, 2013 at 15:04 UTC ( #1065602=perlquestion: print w/replies, xml ) Need Help??
thegirlm0nkey has asked for the wisdom of the Perl Monks concerning the following question:

Hi - getting extremely stuck and would love some insight! I have a file which contains sequential numbers (actually genomic co-ordinates, I'm an amateur bioinformatician!) and an associated score. I need to extract regions where the score dips below a certain level. The file looks something like this:
1 50 2 50 3 1 4 10 5 49 6 8 7 50 8 5 9 5 10 40
So in this example - the first number on each line is the co-ordinate, and the second is the score. I need all the regions scoring less than 50, so for the small example above, I would get something like:
3 6 8 10
Hope that makes sense - I'm basically looking for the first and last positions where the score is less than 50. So far I have slurped the file into an array like this:
foreach my $line (@lines) { chomp $line; my @columns = split(/\t/, $line); my $score = $columns[1]; if ($score < 50) { #something here... } }
But I'm stuck with the 'something here' - I need to keep track of the first time a score of less than 50 is seen, and the last time it is seen before it goes above 50, and capture the two corresponding $columns[0] numbers. Really hope I've explained this properly! TIA.

Replies are listed 'Best First'.
Re: Iterating through file to find specific subsets of lines
by toolic (Bishop) on Dec 04, 2013 at 15:19 UTC
    Keep track of the positions in an array:
    use warnings; use strict; my @lines = <DATA>; my @pos; foreach my $line (@lines) { chomp $line; my @columns = split( /\s+/, $line ); my $score = $columns[1]; if ($score < 50) { push @pos, $columns[0]; } else { print "@pos[0, -1]\n" if @pos; @pos = (); } } print "@pos[0, -1]\n" if @pos; __DATA__ 1 50 2 50 3 1 4 10 5 49 6 8 7 50 8 5 9 5 10 40

    Note: I changed \t to \s+ just to create a self-contained example.

      Thank you! Exactly what I needed!
Re: Iterating through file to find specific subsets of lines
by jethro (Monsignor) on Dec 04, 2013 at 15:23 UTC
    my $dipped=0; my $column; foreach ... ... if ($score < 50) { $dipped= $columns[0] if (not $dipped); } else { print "$dipped $column\n" if ($dipped); $dipped=0; } $column= $column[0]; } print "$dipped $column\n" if ($dipped);

    Untested. $dipped is the variable that stores the first score that dips under 50 in a "dip region" and it also signifies that you are in such a region by being not 0. The construct with $column is necessary to get the number on the last line out of the foreach loop if a dipped region lasts until the end, could be avoided by declaring @columns outside the loop.

    UPDATE: Removed the off-by-one error found by toolic, using $column to store $column[0] of the previous loop step

      Using your scalar would be more efficient than my array solution... if you could get rid of your off-by-1 error (tested).
Re: Iterating through file to find specific subsets of lines
by choroba (Bishop) on Dec 04, 2013 at 15:20 UTC
    What ouput do you expect for the following input?
    1 1 2 49 3 1

    Sorry, did not understand the question.

    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1065602]
Approved by marto
[marto]: erix unlikely, we've had posts within the last few minutes ;)
[ambrus]: nysus: make sure you enter at least two lines for the title and then preview, and read warnings the preview form prints
[davies]: There is an option (I have it set) to force preview before submission. Perhaps your option has been set (accidentally?) and you are not expecting it.
[ambrus]: sorry, I mean at least two words for the title and then preview
[ambrus]: davies: that option is the default. and it's not really "force", it just hides the button.
[marto]: yes, not reading the errors displayed has been a cause of this type of report in the past
[nysus]: Ambrus, ah. I think that was the problem. I'll try. Thanks!
[davies]: Ambrus: you've missed it by a couple of weeks, but consider next year's London Perl Workshop.

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (7)
As of 2017-12-15 11:35 GMT
Find Nodes?
    Voting Booth?
    What programming language do you hate the most?

    Results (431 votes). Check out past polls.