In my benchmarks, substr is about twice as fast as a /..../g regex for getting the next X characters:
cail:~/work/perl/monks$ cat 966488.pl
#!/usr/bin/env perl
use Modern::Perl;
use Benchmark qw(:all);
my $string = '';
for (1..1_000_000){ # make a million-char string
$string .= qw(A C G T)[rand(4)];
}
cmpthese( 100, {
'regex' => \®ex,
'substring' => \&substring,
});
sub substring {
my $str = $string;
my %h;
while(length($str) % 3){ # snip to 3-letter boundary
substr($str,-1, 1, '');
}
while($_ = substr($str,0,3,'')){
$h{$_}++;
}
}
sub regex {
my $str = $string;
my %h;
for ($str =~ /.../g){
$h{$_}++;
}
}
cail:~/work/perl/monks$ perl 966488.pl
Rate regex substring
regex 5.78/s -- -49%
substring 11.4/s 97% --
Of course, if you only want to match certain letters, then you're back to a regex. But in that case, I might still try stripping out all the stuff I don't want with tr//, followed by substr to break it into pieces.
|