Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

extract relevent lines according to array

by coldy (Scribe)
on Apr 19, 2010 at 07:17 UTC ( [id://835397]=perlquestion: print w/replies, xml ) Need Help??

coldy has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks, I have a file that is of the form given in <DATA> and 3 strings that selects the portion of the file I would like, eg
$chrom='chr1'; $start=9839; $stop=9841;

I would like to creatre an array, @values, consisting of the values in the second column of <DATA> according to my 3 strings ,$chr,$start and $stop.

So for my example 3 strings ($chrom='chr1';$start=9839;$stop=9841;), would result in the array @values=(0.007,0.004,0.002)

I hope this makes sense? I was hoping somebody may have some ideas to help me out

Cheers, Chris

_DATA_ variableStep chrom=chr1 9837 0.010 9838 0.008 9839 0.007 9840 0.004 9841 0.002 9842 0.001 variableStep chrom=chr2 9837 0.090 9838 0.038 9839 0.017 9840 0.044 9841 0.052 9842 0.091

Replies are listed 'Best First'.
Re: extract relevent lines according to array
by Corion (Patriarch) on Apr 19, 2010 at 07:19 UTC

    So, what code have you written and where and how does it fail for you?

      I have written this
      while(my$line=<DATA>){ if ($line=~m/$chrom/){ #not sure how to move to the next few lines of <DATA> +where I can test that $s<$start my ($s,$prob)=split(/\s+/,$line); if ($s<$start) { next; } else{push @values,$prob;} } }
      Basically I dont know how to loop through the lines after the line that matches the $chrom condition so I can test if those lines where $start > $s

        The trick is to use variables to memorize where you are in your processing. Either use a variable that records your "status" as a number (that is called a [no such wiki, state machine]) or use one or more variables that record different conditions (commonly called "flags"). Below is your script adapted to use a flag called $foundchrom:

        my $foundchrom=0; while(my$line=<DATA>){ if ($line=~m/^variableStep/) { if ($line=~m/$chrom/){ $foundchrom=1; } else { $foundchrom=0; } #above if-then-else could be written shorter as #$foundchrom= ($line=~m/$chrom/); next; } my ($s,$prob)=split(/\s+/,$line); if ($s<$start or $s>$stop or not $foundchrom) { next; } else{push @values,$prob;} } }
        One technique is to "remember" where you are by calling a subroutine. So below, I read data lines, when $chrom is seen, the sub is called to extract the values. The sub "knows" that we are at the right place and have found $chrom simply because it is executing. Then the appopriate values are extracted (loop could be different if we take advantage of the sorted order of the input data). The main "while" loop will quit when we either run out of DATA or the first record is found (something in @values).

        update: tested with DATA ending in EOF rather than yet another variableStep record and got a undefined $line error, so changed while in get_values() to while ( defined(my $line=<DATA>)  ) like in main loop.

        #!/usr/bin/perl -w use strict; my $chrom='chr1'; my $start=9839; my $stop=9841; my @values; while ( defined(my $line=<DATA>) and !@values) { @values = get_values() if ( $line =~ m/\=$chrom$/); } sub get_values { my @values; while ( defined(my $line=<DATA>) ) { last unless $line =~ m/^\d/; my ($tag,$value) = split(/\s+/,$line); push (@values, $value) if ($tag >= $start and $tag <= $stop); } return @values } print "@values"; #prints: 0.007 0.004 0.002 __DATA__ variableStep chrom=chr1 9837 0.010 9838 0.008 9839 0.007 9840 0.004 9841 0.002 9842 0.001 variableStep chrom=chr2 9837 0.090 9838 0.038 9839 0.017 9840 0.044 9841 0.052 9842 0.091
        This code is also possible as Perl has a tricky .. and ... operator! See Flipin good, or a total flop? for good discussion.
        my @values; while (<DATA>) { if ( (/\=$chrom$/.../^v/) =~ m/^\d+(?<!^1)$/ ) #skip /start/ and /e +nd/ { my ($tag,$value)=split; push (@values, $value) if ($tag >= $start and $tag <= $stop); } else {last if @values} #optional } print "@values"; #prints: 0.007 0.004 0.002
Re: extract relevent lines according to array
by nvivek (Vicar) on Apr 19, 2010 at 09:08 UTC
    You try the following script to do it.
    use strict; use warnings; use Data::Dumper; my $chrom='chr1'; my $start=9839; my $stop=9841; my %hash=(); my @values=(); while(<DATA>) { if(/[a-z]+\s*chrom=$chrom/) { while(<DATA>) { if(/([0-9]+)\s*([0-9]+\.[0-9]+)/) { $hash{$1}=$2; } else { last; } } } } for ($start .. $stop) { push @values,$hash{$_}; } print @values; __DATA__ variableStep chrom=chr1 9837 0.010 9838 0.008 9839 0.007 9840 0.004 9841 0.002 9842 0.001 variableStep chrom=chr2 9837 0.090 9838 0.038 9839 0.017 9840 0.044 9841 0.052 9842 0.091
Re: extract relevent lines according to array
by Skeeve (Parson) on Apr 19, 2010 at 14:47 UTC
    $chrom='chr1'; $start=9839; $stop=9841; while (<DATA>) { if (/variableStep chrom=$chrom/ ... /variableStep chrom/) { if (/^$start\b/ .. /^$stop\b/) { chomp; push @result, (split ' ',$_,2)[1]; } } } use Data::Dumper; print Dumper \@result; __DATA__ variableStep chrom=chr1 9837 0.010 9838 0.008 9839 0.007 9840 0.004 9841 0.002 9842 0.001 variableStep chrom=chr2 9837 0.090 9838 0.038 9839 0.017 9840 0.044 9841 0.052 9842 0.091

    s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
    +.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://835397]
Approved by ashokpj
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (8)
As of 2024-04-23 08:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found