This is kind of continuation of questions I've been asking when bumping into unexpected regex' performance issues, the last one was 11155604, I think. This one is also observed with very fresh/latest strawberry-perl-5.40.0.1-64bit-PDL, so perhaps it's something new.
I'm trying to improve one of CPAN modules which deals with PDF, the string below simulates a classic cross-reference table, with number of entries and preceding file data roughly the same as in one of the PDF files I'm using for tests.
Method (1) is similar to the original. I tried the (4) first, with vague idea of not creating useless copies of data. However, this is when I noticed that, while other changes (not relevant here) where steady speed/memory gains, unexpectedly everything got very slow. So I concocted the SSCCE below to ask if perhaps this is a bug in Perl or not. Also, strangely, the results of (4) vary somewhat from run to run, sometimes as "fast" as 1.33 s.
(Now I think to use perhaps the (3) further, after checking if global anchor is maintained/used elsewhere by module. The question remains about bug in Perl, as accidental by-product of otherwise idle investigations)
use strict;
use warnings;
use feature 'say';
use Time::HiRes 'time';
say $^V;
my $s = '*' x 5_000_000;
$s .= "0123456789 01234 n \n" x 40_000;
my $re = qr/ (\d{10}) \x{20} (\d{5}) \x{20} (\w) \s\s /x;
my ( $xref, $t );
# (1) peel off entry by entry
$xref = substr $s, 5_000_000; # from shorter string
$t = time;
for ( 0 .. 39_999 ) {
my $entry = substr $xref, $_ * 20, 20;
die unless $entry =~ / \A $re /x;
# do something useful with captures
}
say time - $t;
$xref = substr $s, 5_000_000; # (2) global match (shorter string)
$t = time;
for ( 0 .. 39_999 ) {
die unless $xref =~ / \G $re /gx;
}
say time - $t;
# (3) global match (original string),
pos( $s ) = 5_000_000; # start from pos
$t = time;
for ( 0 .. 39_999 ) {
die unless $s =~ / \G $re /gx;
}
say time - $t;
$xref = \substr $s, 5_000_000; # (4) use reference to substr
$t = time;
for ( 0 .. 39_999 ) {
die unless $$xref =~ / \G $re /gx;
}
say time - $t;
__END__
v5.40.0
0.0973920822143555
0.04703688621521
0.0475959777832031
3.08383107185364