http://www.perlmonks.org?node_id=1058958

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi -- I'm working on writing code to check each string within an array. This is directed at checking that a DNA input contains only A, T, C, and G. I have written it such that you can input your DNA sequence on the command line. When I run this with an input such as "ATGTCAHCGT," I get nothing back where I should get a print statement of "Non-nucleotide at 6 position." What I am missing/ where am I going wrong? Any help much appreciated. Cheers, A

my $dna_input = $ARGV[0]; # turn command line input into a variable my @dna = split("", $dna_input); #split the variable into an array so +as to check each nucleotide my $count = 0; # keep track of position so as to report where there is + a non-nuceloetide character my $base; foreach $base (@dna) { if($base eq 'A' || 'T' || 'C' || 'G') { $count += 1; } else { print "Non-nucleotide at $count position\n"; } }

Replies are listed 'Best First'.
Re: If/else within a foreach loop to check strings within an array
by choroba (Cardinal) on Oct 19, 2013 at 22:01 UTC
    Very similar to a recent question. See Re: Trying to compare a string....

    Update: In your case, regular expression would be the easiest solution:

    my $dna = shift; # Get the first command line argument. if ($dna =~ /[^ACTG]/g) { die "Non-nucleotide at " . pos($dna); }
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      Fantastic, thank you!

Re: If/else within a foreach loop to check strings within an array
by LanX (Saint) on Oct 19, 2013 at 22:03 UTC
    you are getting precedence wrong, what you are effectively doing is ($base eq 'A') || 'T' || 'C' || 'G'

    There are plenty of alternatives,like:

    $base eq 'A' || $base eq 'T' || $base eq 'C' || $base eq 'G'

    Cheers Rolf

    ( addicted to the Perl Programming Language)

Re: If/else within a foreach loop to check strings within an array
by BrowserUk (Patriarch) on Oct 19, 2013 at 22:39 UTC

    Whilst for the very short input in your example choroba's solution is perfectly fine, for real sequences which are typically very large, spliting the sequence to an array of scalars each holding a single char and then processing them one at a time is grossly inefficient. Of both time and memory.

    Far more efficient for your stated purpose is to process the sequence as a string:

    my $dna_input = $ARGV[0]; print "Non-nucleotide '$1' at posn %d\n", $-[0] while $dna_input =~ m[ +([^acgt])]g;

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      > Whilst for the very short input in your example choroba's solution is perfectly fine,

      Read again! No big difference to your approach.

      Cheers Rolf

      ( addicted to the Perl Programming Language)

        You're correct. My mistake.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: If/else within a foreach loop to check strings within an array
by drmrgd (Beadle) on Oct 20, 2013 at 11:46 UTC
    It is possible to have more than non-standard nucleotide in your sequence? If so, you might consider looking for all the non-matches:
    #!/usr/bin/perl use warnings; use strict; my $dna_input = shift; my @dna = split( '', $dna_input ); my (@index) = grep { $dna[$_] !~ /[ATCG]/ } 0..$#dna; print "Non-nucleotide at position $_\n" for @index;
Re: If/else within a foreach loop to check strings within an array
by boftx (Deacon) on Oct 20, 2013 at 19:53 UTC

    getc comes to mind for char-at-a-time if one needs to process a file with potentially very long lines (or no line/record endings at all) instead of taking input directly as a command line arg.

    The answer to the question "Can we do this?" is always an emphatic "Yes!" Just give me enough time and money.