An important part of making benchmark code is to ensure the alternatives do the same thing. Your do_eval subroutine had a minor problem (you only had =~ $genome on the first match, instead of all of them), which I corrected. But then when I looked at the output from the two subroutines, I noticed it was different:
sub do_eval {
my $genome = "AGTATCGATCGATGCATGCTAGCTAGCTAGCTAGCTAGCTAGSTGCTAGCT";
my @regexes = ('AGT', 'ATC'); # dont care
my $count = 0;
my $code = 'if ('
. join(' && ',
map { "\$genome =~ /$_/" } @regexes)
. ') { $count++; }';
eval $code;
die "Error: $@\n Code:\n$code\n" if ($@);
return $count; # returns 1
}
sub do_qr {
my $string = "AGTATCGATCGATGCATGCTAGCTAGCTAGCTAGCTAGCTAGSTGCTAGCT";
my @regexes = ("AGT", "ATC"); # dont care
my $count = 0;
my @compiled = map qr/$_/, @regexes;
for(my $i=0; $i<@regexes; $i++) {
if($string =~ /$compiled[$i]/){
$count++;
}
}
return $count; # returns 2
}
To fix that, I changed your do_eval sub to the following:
sub do_eval {
my $genome = "AGTATCGATCGATGCATGCTAGCTAGCTAGCTAGCTAGCTAGSTGCTAGCT";
my @regexes = ('AGT', 'ATC'); # dont care
my $count = 0;
my $code = join ";",
map { "\$count++ if \$genome =~ /$_/" } @regexes;
eval $code;
die "Error: $@\n Code:\n$code\n" if ($@);
return $count; # returns 2
}
And I decided to add my own take on the matter, which generates One Big Regex, rather than a bunch of them:
sub do_genre {
my $genome = "AGTATCGATCGATGCATGCTAGCTAGCTAGCTAGCTAGCTAGSTGCTAGCT";
my @regexes = ("AGT", "ATC"); # dont care
my $regex = join "|", map "($_)", @regexes;
my $count = () = $genome =~ /$regex/;
return $count; # returns 2
}
When I run the benchmark, I get the following results:
Rate do_eval do_qr do_genre
do_eval 15531/s -- -65% -90%
do_qr 44671/s 188% -- -72%
do_genre 157893/s 917% 253% --
Which just goes to show that the string eval is slow, but the looping is even slower. A different algorithm makes a big difference. |