Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

Re^2: Using grep in a scalar context

by AnomalousMonk (Chancellor)
on Feb 07, 2013 at 16:38 UTC ( #1017691=note: print w/replies, xml ) Need Help??

in reply to Re: Using grep in a scalar context
in thread Using grep in a scalar context

I have no bioinformatic background, but I'd like to offer a couple of comments on your code, specifically the version that counts overlapping letter pairs (would 'digrams' be an appropriate term for these?).

my %acids; for(my $i = 0; $i < length($string)-1; $i++){ my $amino = substr($string, $i, 2); if(exists $acids{$amino}){ $acids{$amino}++; }else{ $acids{$amino} = 1; } #print "$amino\n"; }

Because it is not necessary to check for the existence of a hash key before incrementing its value (due to autovivification), the body of this for-loop can be reduced to a single statement:
    ++$acids{ substr $string, $i, 2 }
This will almost certainly yield a speed benefit.

Alternatively, in 5.10+ versions of Perl, the entire for-loop can be replaced by a single regex (tested):
    $string =~ m{ (?= (..) (?{ ++$pairs2{$^N} }) (*FAIL)) }xms;
This may or may not increase speed; you will have to Benchmark this for yourself. The alternate regex
    m{ (?= .. (?{ ++$pairs2{${^MATCH}} }) (*FAIL)) }xmsp
also works (note the additional  /p regex modifier) and may be slightly faster because no capturing group is used. Again, Benchmark-ing will tell the tale.

>perl -wMstrict -le "use Test::More tests => 2; use Data::Dump; ;; my $string = 'ABCCCDEAB'; ;; my %pairs1; $pairs1{$_}++ for $string =~ /(?=(..))/g; ;; local our %pairs2; $string =~ m{ (?= .. (?{ ++$pairs2{${^MATCH}} }) (*FAIL)) }xmsp; ;; my %pairs3; for (my $i = 0; $i < length($string) - 1; ++$i) { ++$pairs3{ substr $string, $i, 2 } } ;; dd \%pairs1, \%pairs2, \%pairs3; is_deeply \%pairs1, \%pairs2, '1 & 2, same results'; is_deeply \%pairs1, \%pairs3, '1 & 3, same results'; " 1..2 ( { AB => 2, BC => 1, CC => 2, CD => 1, DE => 1, EA => 1 }, { AB => 2, BC => 1, CC => 2, CD => 1, DE => 1, EA => 1 }, { AB => 2, BC => 1, CC => 2, CD => 1, DE => 1, EA => 1 }, ) ok 1 - 1 & 2, same results ok 2 - 1 & 3, same results

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1017691]
[shmem]: holli: perhaps even better, less accumulated trauma upon them
[james28909]: its very possible. what wouldt that imply though?
[davido]: The caveman baby in a modern elementary school would succumb to something my kids are immune to, in about the first three days of school, however.
[holli]: accumulated trauma? as in genetic memory?
[erix]: ( or get the Neandertal DNA reconstructed from bits of our own DNA. )
[shmem]: holli, as in genetic memory / social memory
[erix]: heh, it's Neanderthal not -tal, I see now
[james28909]: i was more or less using a caveman to get the point across. we come from single cell organisms, and are now preparing to try to inhabit another planet. (please save me the morality of it lol)
[holli]: it's a german place. -tal is correct. the h is just a concession for hipsters ;-)
[erix]: haha

How do I use this? | Other CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (14)
As of 2017-12-15 15:47 GMT
Find Nodes?
    Voting Booth?
    What programming language do you hate the most?

    Results (435 votes). Check out past polls.