Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Creating 2 arrays from a larger array

by imtakinbioinformatic (Initiate)
on Mar 08, 2012 at 02:00 UTC ( #958397=perlquestion: print w/ replies, xml ) Need Help??
imtakinbioinformatic has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I'm trying to work on code that reads an input file that has lines that start with > then lines of characters (repeated many times). I want to put the line that starts with > into one array, and then put all of the lines following the >line into a separate array as one element (until the next > comes.) So I want to end up with one array that has all of those > lines as elements, and then another array that has all of the text behind each of those > lines as separate elements. My code is working for the > lines how I want, but I seem to just be filling the second array with the text that followed that first >. I feel like this is confusing, but any help would be greatly appreciated!

foreach $line(@DNA){ if ($line=~/^>(\S+)/){ $seqID=$1; push(@seqList, $seqID); push (@sequences,$dnaString); } else{ chomp $line; $dnaString=$dnaString .$line;} }

Comment on Creating 2 arrays from a larger array
Download Code
Replies are listed 'Best First'.
Re: Creating 2 arrays from a larger array
by tangent (Curate) on Mar 08, 2012 at 02:43 UTC
    Not sure of your data but this might help:
    my @seqList; my @sequences; my $dnaString; my $count = 0; my @DNA = <DATA>; foreach my $line (@DNA) { if ($line=~/^>(\S+)/){ push (@seqList, $1); push (@sequences, $dnaString) if $count++; $dnaString = ''; } else { chomp $line; $dnaString .= $line; } } push (@sequences, $dnaString); # need to push last one __DATA__ >123 blah abcdef ghijkl >456 de dah mnopqr >789 nothing wanted here stuvwxyz # OUTPUT @seqList = ( '123', '456', '789' ); @sequences = ( 'abcdefghijkl', 'mnopqr', 'stuvwxyz' );
      Thanks tangent! I'm confused how  push (@sequences, $dnaString) ever operates. If the string matches >, it goes to the seqList array, so how is the @sequences array being created?
        You are in a loop, so $dnaString is being added to everytime the line doesn't match your start pattern. $dnaString is always one step behind @seqList so when you come to the next start line, it pushes the previous $dnaString onto @sequences, then clears it for the next iteration. Best thing to do is try it: set $count = 1 and see what happens.

        Update: after re-reading this I'm a bit confused myself. Better to put a print statement within the loop:
        my $dnaString = ''; ... if ($line=~/^>(\S+)/){ print qq|Count: $count, Match: $1, String: $dnaString\n|; push (@seqList, $1);
Re: Creating 2 arrays from a larger array
by Anonymous Monk on Mar 08, 2012 at 02:40 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://958397]
Approved by davido
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (13)
As of 2015-07-31 08:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (276 votes), past polls