Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

filtering data and while loop problem

by coldy (Scribe)
on Apr 20, 2010 at 04:03 UTC ( [id://835641]=perlquestion: print w/replies, xml ) Need Help??

coldy has asked for the wisdom of the Perl Monks concerning the following question:

I posted a question yesterday entitled "extract relevent lines according to array" and some good responses - however I need to extend my problem.

Summary: Id like to extract the values in the second column of <DATA> corresponding to the strings in @triples (hopefully it's apparent what I want to do from my code!)

Im having a problem with exiting "while ( defined(my $line=<DATA>) and !@values)" I think im not entering the subroutine again after the first call either.

#!/usr/bin/perl -w use strict; my @triples = ("chr1 9837 9840", "chr1 99998 99999", "chr2 9838 9840") +; my($start,$chrom,$stop); foreach my $triple (@triples){ print "$triple :"; ($chrom,$start,$stop)=split(/\s+/,$triple); my @values=(); while ( defined(my $line=<DATA>) and !@values ) { @values = get_values() if ( $line =~ m/$chrom/); if(@values) { print "average ", average(\@values), "\n"; }else {print "not found: average NA \n";$values[0]=1;} } } sub get_values { my @values; while ( defined(my $line=<DATA>) ) { last unless $line =~ m/^\d/; my ($tag,$value) = split(/\s+/,$line); push (@values, $value) if ($tag >= $start and $tag <= $stop); } return @values } sub average { my ($array_ref) = @_; my $sum; my $count = scalar @$array_ref; foreach (@$array_ref) { $sum += $_; } return $sum / $count; } __DATA__ variableStep chrom=chr1 9837 0.010 9838 0.008 9839 0.007 9840 0.004 9841 0.002 9842 0.001 variableStep chrom=chr2 9837 0.090 9838 0.038 9839 0.017 9840 0.044 9841 0.052 9842 0.091

Replies are listed 'Best First'.
Re: filtering data and while loop problem
by samarzone (Pilgrim) on Apr 20, 2010 at 05:13 UTC
    When you do

    last unless $line =~ m/^\d/;

    in your get_values subroutine, it keeps reading all the lines and stops when it finds a line which is not starting with a digit. This line is

    variableStep chrom=chr2

    in your __DATA__ section. Now when the code flow go back to

    while ( defined(my $line= <DATA> ) and !@values )

    in foreach loop, the $line gets the line

    9837    0.090

    which, I think, is not what you are expecting.
Re: filtering data and while loop problem
by ikegami (Patriarch) on Apr 20, 2010 at 05:37 UTC

    You don't quite have for (...) { while (<DATA>) { } }, but you do suffer from the same problem (in addition to the one samarzone mentioned). If the first pass of the loop reads DATA until its end of file, what do you think the next pass of the loop will read?

    You should be loading the contents of DATA into a memory structure, and extracting the data you need from the data structure for each triple.

    use strict; use warnings; my %data; { my $chrom; while (<DATA>) { chomp; if (/^variableStep chrom=(\S+)/) { $chrom = $1; } else { my ($tag, $value) = split; push @{ $data{$chrom} }, [ $tag, $value ]; } } } for ( [qw( chr1 9837 9840 )], [qw( chr1 99998 99999 )], [qw( chr2 9838 9840 )], ) { my ($chrom, $start, $stop) = @$_; my $chrom_d = $data{$chrom}; my $count; my $sum; for (@$chrom_d) { my ($tag, $value) = @$_; next if $tag < $start || $tag > $stop; ++$count; $sum += $value; } if ($count) { my $avg = $sum/$count; print("$chrom $start..$stop average: $avg\n"); } else { print("$chrom $start..$stop average: NA\n"); } } __DATA__ variableStep chrom=chr1 9837 0.010 9838 0.008 9839 0.007 9840 0.004 9841 0.002 9842 0.001 variableStep chrom=chr2 9837 0.090 9838 0.038 9839 0.017 9840 0.044 9841 0.052 9842 0.091
    chr1 9837..9840 average: 0.00725 chr1 99998..99999 average: NA chr2 9838..9840 average: 0.033
Re: filtering data and while loop problem
by nvivek (Vicar) on Apr 20, 2010 at 05:27 UTC
    You try this,it will work.
    use strict; use warnings; use Data::Dumper; my @triples = ("chr1 9837 9840", "chr1 99998 99999", "chr2 9838 9840") +; my($start,$chrom,$stop,$avg); foreach my $triple (@triples){ print "$triple :"; ($chrom,$start,$stop)=split(/\s+/,$triple); my @values=(); while ( defined(my $line=<DATA>) and !@values ) { if ( $line =~ m/$chrom/) { @values=get_values(); if(@values) { average(\@values); print "Average:",average(\@values),"\n +" ; } } } print "Not Found:Average NA\n" unless (@values); seek DATA,0,0; } sub get_values { my @values; while ( defined(my $line=<DATA>) ) { last unless $line =~ m/^\d/; my ($tag,$value) = split(/\s+/,$line); push (@values, $value) if ($tag >= $start and $tag <= +$stop); } return @values } sub average { my ($array_ref) = @_; my $sum; my $count = scalar @$array_ref; foreach (@$array_ref) { $sum += $_; } return $sum / $count; } __DATA__ variableStep chrom=chr1 9837 0.010 9838 0.008 9839 0.007 9840 0.004 9841 0.002 9842 0.001 variableStep chrom=chr2 9837 0.090 9838 0.038 9839 0.017 9840 0.044 9841 0.052 9842 0.091
      DATA is the handle Perl uses to parse the file. seek DATA,0,0; seeks back to the #! line. You're seeking way too far back. You'd need to tell the starting position and seek to that position.
Re: filtering data and while loop problem
by Marshall (Canon) on Apr 20, 2010 at 12:48 UTC
    I see that you are using parts of my code and that is a good thing! That code had some limitations although it worked for your originally posted problem. I posted some more code at Re^5: extract relevent lines according to array based upon your new requirement input in that thread. I guess you where posting here while I was in the process of writing more code. Have a look and use what you want.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://835641]
Approved by biohisham
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (4)
As of 2024-04-16 12:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found