Beefy Boxes and Bandwidth Generously Provided by pair Networks Frank
go ahead... be a heretic
 
PerlMonks  

Re: How can I get correct result in counting 3-letter words?

by ikegami (Pope)
on Apr 22, 2012 at 20:02 UTC ( #966491=note: print w/ replies, xml ) Need Help??


in reply to How can I get correct result in counting 3-letter words?

You did not constrain where any of the triplets could match. You could replace

while ($seq =~ /GCT/ig) { $GCT++; }

with

while ($seq =~ /\G(?:...)*?GCT/sig) { $GCT++; }

or with

while ($seq =~ /\G(...)/sg) { $GCT++ if uc($1) eq 'GCT'; }

but lets take one further and use

while ($seq =~ /\G(...)/sg) { $counts{uc($1)}++; }

That reduces your program to

my %counts; ++counts{uc($_)} for $seq =~ /.../sg; for my $l1 (qw( T C A G )) { for my $l2 (qw( T C A G )) { for my $l3 (qw( T C A G )) { my $k = "$l1$l2$l3"; my $v = $counts{$k} || 0; print("$k=$v;"); } print("\n"); } }


Comment on Re: How can I get correct result in counting 3-letter words?
Select or Download Code
Re^2: How can I get correct result in counting 3-letter words?
by NetWallah (Monsignor) on Apr 22, 2012 at 23:14 UTC
    That does not complile (missing $ sigil).

    did you mean the second line to read

    ++$counts{uc($_)} for $a =~ /[TCAG]{3}/sg;
    With that, I get the counts the OP expects.

                 All great truths begin as blasphemies.
                       ― George Bernard Shaw, writer, Nobel laureate (1856-1950)

Re^2: How can I get correct result in counting 3-letter words?
by BillKSmith (Friar) on Apr 22, 2012 at 23:53 UTC
    I prefer the function "variations_with_repetition" in the module Algorithm::Combinatorics rather than the do-it-yourself approach of generating the $k's.
Re^2: How can I get correct result in counting 3-letter words?
by aaron_baugher (Chaplain) on Apr 23, 2012 at 01:18 UTC

    In my benchmarks, substr is about twice as fast as a /..../g regex for getting the next X characters:

    cail:~/work/perl/monks$ cat 966488.pl #!/usr/bin/env perl use Modern::Perl; use Benchmark qw(:all); my $string = ''; for (1..1_000_000){ # make a million-char string $string .= qw(A C G T)[rand(4)]; } cmpthese( 100, { 'regex' => \&regex, 'substring' => \&substring, }); sub substring { my $str = $string; my %h; while(length($str) % 3){ # snip to 3-letter boundary substr($str,-1, 1, ''); } while($_ = substr($str,0,3,'')){ $h{$_}++; } } sub regex { my $str = $string; my %h; for ($str =~ /.../g){ $h{$_}++; } } cail:~/work/perl/monks$ perl 966488.pl Rate regex substring regex 5.78/s -- -49% substring 11.4/s 97% --

    Of course, if you only want to match certain letters, then you're back to a regex. But in that case, I might still try stripping out all the stuff I don't want with tr//, followed by substr to break it into pieces.

    Aaron B.
    My Woefully Neglected Blog, where I occasionally mention Perl.

      Humm, you didn't actually use my code. (Not that the results would be visibly different.)

      Your suggestion to remove the offending letters is broken if said letters can appear anywhere but the beginning and end of the string. "AAAGNTTT" should give "AAA", "TTT", but you're algorithm would give "AAA" and "GTT".

        True, I was just curious about the performance difference between substr and /.../g. And I don't know this bioinformatics stuff well enough to know what's a valid group and what isn't, and whether you can assume things will break on the right boundaries (or what to do with extra letters if they don't). But doesn't your final solution with /.../gs give "AAA" and "GNT"? Should it be /[ACGT]{3}/gs to make it skip to the next valid set of three?

        Aaron B.
        My Woefully Neglected Blog, where I occasionally mention Perl.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://966491]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (4)
As of 2013-05-23 01:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best material for plates (tableware) is:









    Results (473 votes), past polls