Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

filtering an array

by prbndr (Acolyte)
on Sep 01, 2012 at 19:08 UTC ( #991184=perlquestion: print w/ replies, xml ) Need Help??
prbndr has asked for the wisdom of the Perl Monks concerning the following question:

hi monks,

another rookie perl question... i have an array that i populate with the following code:

 push @positions, $read->end + 1, $base;

the first entry ($read->end + 1) is a number and the second entry is a string that can be an A,G,C, or T, so each line of this array is a number/string pair.

i would like to remove every line of this array that contains a G or a T as the string. after doing that, i would like to get rid of the strings altogether, so i just have an array with one column representing the number. how do i go about this?

i've looked into the pop function, which seems relevant for the second half of what i want to do, but i am unsure about how to do the filtering, namely accessing the second element in each line of the array. any advice would be much appreciated!

Comment on filtering an array
Download Code
Re: filtering an array
by philiprbrenan (Monk) on Sep 01, 2012 at 19:41 UTC
    use feature ":5.14"; use warnings FATAL => qw(all); use strict; use Data::Dump qw(dump pp); my @positions; for(split /\n/, <<'END') 1 ACAC 2 AGAC 3 AGTC 4 ACCA END {push @positions, [split/ /] unless /[GT]/; } say "With bases: ", dump(@positions); $_ = $_->[0] for @positions; say "Without bases: ", dump(@positions);

    Produces

    With bases: ([1, "ACAC"], [4, "ACAC"]) Without bases: (1, 4)

      i should say that the string is just one single letter, either an "A" "T" "G" or "C", not multiple letters combined.

      The script will work as long as the string is one or more characters long and does not contain spaces.

Re: filtering an array
by jwkrahn (Monsignor) on Sep 01, 2012 at 20:18 UTC
    push @positions, [ $read->end + 1, $base ];
    i would like to remove every line of this array that contains a G or a T as the string
    @positions = grep $_->[ 1 ] =~ /ac/i, @positions;
    i would like to get rid of the strings altogether, so i just have an array with one column representing the number
    @positions = map $_->[ 0 ], @positions;
      this code gives the warning: Use of uninitialized value in pattern match (m//). what exactly does this mean?

        You are using a variable in a pattern matching regular expression and said variable currently has no value. Please try assigning a value to the variable and see what happens. Thanks.

Re: filtering an array
by swampyankee (Parson) on Sep 02, 2012 at 01:35 UTC

    Well, what have you tried?

    There is, of course, more than one way to do this. One way is to use grep:

    my @after; @after = grep { !/[GT]/ } @positions;

    which is, in essence,

    my @after; foreach @positions { push(@after, $_) if (!/[GT]/); }

    I tend to use parenthesis where perl's syntax doesn't require them, but that's just me. I find they frequently improve readability.


    Information about American English usage here and here. Floating point issues? Please read this before posting. — emc

Re: filtering an array
by ww (Bishop) on Sep 02, 2012 at 01:50 UTC
    Alternate approach (with a little error checking, which I hope is so basic and transparent as to need no explanation):

    You might want to upend your thought process. Simply warn about any elements of your data that have "G" or "T" (and, as you'll see in the code below, any non-conforming data -- letters other'n ACTG; too many or few letters, etc.) and then next past them without saving. Save the desired elements (the digits) to your @positions array, only after substituting away any combinations of "A" and "C" and -- when you've processed all the data, spit out the positions that satisfy your criteria:

    #!/usr/bin/perl use 5.014; # 991184 my @positions; my @data = ('1 ACAC', '2 AGAC', '3 AGTC', '4 ACCA', '5 DUMMY', '6 ACAATG', '7 CAAC', '8 acaacc', '9 aca ', ); for my $data(@data) { if ( $data =~ /[GT]/ || $data =~ /[^[ACTG]{4}$/) { say "Data ERROR or contains G or T ( $data )"; next; } elsif ( $data =~ /^\d+ [AC]{4}$/i ) { # too many; too few? cove +red here $data =~ s/ [AC]{4}//i; chomp $data; push @positions, $data; } else { say "Problem with data? $data"; } } print "\n Good data at positions: "; for my $position (@positions) { print "$position "; # depending on size of valid position +s, you may } # want to stack them vertically -- # simply replace the 'print' with 'sa +y' say "\n Done";

    Output:

    C:\ 991184.pl Data ERROR or contains G or T ( 2 AGAC ) Data ERROR or contains G or T ( 3 AGTC ) Data ERROR or contains G or T ( 5 DUMMY ) Data ERROR or contains G or T ( 6 ACAATG ) Data ERROR or contains G or T ( 8 acaacc ) Data ERROR or contains G or T ( 9 aca ) Good data at positions: 1 4 7 Done

    Done in babytalk, to some extent, to ensure clarity. Compare OP's questions about the Monks' responses, above.

      yes, i ended up doing exactly this ww. i realized i was going after too complicated of a solution instead of trying a much simpler approach. glad we were on the same page!

Re: filtering an array
by vagabonding electron (Hermit) on Sep 02, 2012 at 14:55 UTC
    I am still a newbie, but / therefore - what is wrong with using hash here (with keys as line numbers and values as bases)?
    #!/usr/bin/perl -l use strict; use warnings; print "If an array element is a line number and a base together as a s +tring:"; my %positions; for my $line (split /\n/, <<'END') 1 ACAC 2 AGAC 3 AGTC 4 ACCA END { next if $line =~ /[GT]/; my ($number, $bases ) = split / /, $line; $positions{$number} = $bases; } for my $number ( sort keys %positions ) { print "$number => $positions{$number}"; print "Just print a number $number"; } print "If there is an array element for a line number and for a base:" +; my @array = ( qw (1 ACAC 2 AGAC 3 AGTC 4 ACCA) ); %positions = (); for (my $i = 0; $i < @array; $i +=2 ) { my ($number, $bases ) = @array[$i, $i+1]; next if $bases =~ /[GT]/; $positions{$number} = $bases; } for my $number (sort keys %positions) { print "$number => $positions{$number}"; print "Just print a number $number"; }
    It prints:
    If an array element is a line number and a base together as a string: 1 => ACAC Just print a number 1 4 => ACCA Just print a number 4 If there is an array element for a line number and for a base: 1 => ACAC Just print a number 1 4 => ACCA Just print a number 4

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://991184]
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (12)
As of 2014-07-30 06:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (229 votes), past polls