You have not used a fair test: note that my regular expression included the Metacharacters ^ and $. If I reconfigure your tests to consider this factor and swap to:
sub regex {
my $cnt=0;
for (@words) {
++$cnt if /^abcb$/i;
}
return $cnt;
}
I get the output:
In 4096 words, 16 are 'abcb'
Rate uccmp regex
uccmp 1192/s -- -5%
regex 1255/s 5% --
In 4096 words, 0 are 'abcb'
Rate uccmp regex
uccmp 1225/s -- -3%
regex 1264/s 3% --
In 4096 words, 4096 are 'abcb'
Rate regex uccmp
regex 970/s -- -23%
uccmp 1260/s 30% --
which obviously compares much better. This still does not consider that the string compare require a chomp which the regular expression does not. Modifying your benchmark to consider this: yields the results: In 4096 words, 16 are 'abcb'
Rate uccmp regex
uccmp 812/s -- -32%
regex 1197/s 47% --
In 4096 words, 0 are 'abcb'
Rate uccmp regex
uccmp 861/s -- -31%
regex 1255/s 46% --
In 4096 words, 4096 are 'abcb'
Rate uccmp regex
uccmp 856/s -- -11%
regex 964/s 13% --
which I think clearly favors the regular expression. In addition, if you really wanted to squeeze out performance, you could skip the split in the OP as well: which yields:
In 4096 words, 16 are 'abcb'
Rate uccmp regex
uccmp 200/s -- -74%
regex 767/s 283% --
In 4096 words, 0 are 'abcb'
Rate uccmp regex
uccmp 200/s -- -74%
regex 756/s 278% --
In 4096 words, 4096 are 'abcb'
Rate uccmp regex
uccmp 206/s -- -66%
regex 603/s 193% --
Update: As ikegami points out, I failed to localize the arrays to the test routines, so there were a large number of no-ops. Fixing that with the code my @words = @words added where appropriate reduced margins but maintained ordering. I suspect that is just a function of the linear overhead of copying the large arrays. If this is incorrect, I would appreciate insight. |