I have no bioinformatic background, but I'd like to offer a couple of comments on your code, specifically the version that counts overlapping letter pairs (would 'digrams' be an appropriate term for these?).
my %acids;
for(my $i = 0; $i < length($string)-1; $i++){
my $amino = substr($string, $i, 2);
if(exists $acids{$amino}){
$acids{$amino}++;
}else{
$acids{$amino} = 1;
}
#print "$amino\n";
}
Because it is not necessary to check for the existence of a hash key before incrementing its value (due to autovivification), the body of this for-loop can be reduced to a single statement:
++$acids{ substr $string, $i, 2 }
This will almost certainly yield a speed benefit.
Alternatively, in 5.10+ versions of Perl, the entire for-loop can be replaced by a single regex (tested):
$string =~ m{ (?= (..) (?{ ++$pairs2{$^N} }) (*FAIL)) }xms;
This may or may not increase speed; you will have to Benchmark this for yourself.
The alternate regex
m{ (?= .. (?{ ++$pairs2{${^MATCH}} }) (*FAIL)) }xmsp
also works (note the additional /p regex modifier) and may be slightly faster because no capturing group is used. Again, Benchmark-ing will tell the tale.
>perl -wMstrict -le
"use Test::More tests => 2;
use Data::Dump;
;;
my $string = 'ABCCCDEAB';
;;
my %pairs1;
$pairs1{$_}++ for $string =~ /(?=(..))/g;
;;
local our %pairs2;
$string =~ m{ (?= .. (?{ ++$pairs2{${^MATCH}} }) (*FAIL)) }xmsp;
;;
my %pairs3;
for (my $i = 0; $i < length($string) - 1; ++$i) {
++$pairs3{ substr $string, $i, 2 }
}
;;
dd \%pairs1, \%pairs2, \%pairs3;
is_deeply \%pairs1, \%pairs2, '1 & 2, same results';
is_deeply \%pairs1, \%pairs3, '1 & 3, same results';
"
1..2
(
{ AB => 2, BC => 1, CC => 2, CD => 1, DE => 1, EA => 1 },
{ AB => 2, BC => 1, CC => 2, CD => 1, DE => 1, EA => 1 },
{ AB => 2, BC => 1, CC => 2, CD => 1, DE => 1, EA => 1 },
)
ok 1 - 1 & 2, same results
ok 2 - 1 & 3, same results
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
Outside of code tags, you may need to use entities for some characters:
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.
|
|