Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much

Creating 2 arrays from a larger array

by imtakinbioinformatic (Initiate)
on Mar 08, 2012 at 02:00 UTC ( #958397=perlquestion: print w/replies, xml ) Need Help??
imtakinbioinformatic has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I'm trying to work on code that reads an input file that has lines that start with > then lines of characters (repeated many times). I want to put the line that starts with > into one array, and then put all of the lines following the >line into a separate array as one element (until the next > comes.) So I want to end up with one array that has all of those > lines as elements, and then another array that has all of the text behind each of those > lines as separate elements. My code is working for the > lines how I want, but I seem to just be filling the second array with the text that followed that first >. I feel like this is confusing, but any help would be greatly appreciated!

foreach $line(@DNA){ if ($line=~/^>(\S+)/){ $seqID=$1; push(@seqList, $seqID); push (@sequences,$dnaString); } else{ chomp $line; $dnaString=$dnaString .$line;} }

Replies are listed 'Best First'.
Re: Creating 2 arrays from a larger array
by tangent (Vicar) on Mar 08, 2012 at 02:43 UTC
    Not sure of your data but this might help:
    my @seqList; my @sequences; my $dnaString; my $count = 0; my @DNA = <DATA>; foreach my $line (@DNA) { if ($line=~/^>(\S+)/){ push (@seqList, $1); push (@sequences, $dnaString) if $count++; $dnaString = ''; } else { chomp $line; $dnaString .= $line; } } push (@sequences, $dnaString); # need to push last one __DATA__ >123 blah abcdef ghijkl >456 de dah mnopqr >789 nothing wanted here stuvwxyz # OUTPUT @seqList = ( '123', '456', '789' ); @sequences = ( 'abcdefghijkl', 'mnopqr', 'stuvwxyz' );
      Thanks tangent! I'm confused how  push (@sequences, $dnaString) ever operates. If the string matches >, it goes to the seqList array, so how is the @sequences array being created?
        You are in a loop, so $dnaString is being added to everytime the line doesn't match your start pattern. $dnaString is always one step behind @seqList so when you come to the next start line, it pushes the previous $dnaString onto @sequences, then clears it for the next iteration. Best thing to do is try it: set $count = 1 and see what happens.

        Update: after re-reading this I'm a bit confused myself. Better to put a print statement within the loop:
        my $dnaString = ''; ... if ($line=~/^>(\S+)/){ print qq|Count: $count, Match: $1, String: $dnaString\n|; push (@seqList, $1);
Re: Creating 2 arrays from a larger array
by Anonymous Monk on Mar 08, 2012 at 02:40 UTC

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://958397]
Approved by davido
and snow settles gently...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (7)
As of 2018-04-26 12:14 GMT
Find Nodes?
    Voting Booth?