Re^3: Request to correct the perl code for getting substrings

in reply to Re^2: Request to correct the perl code for getting substrings
in thread Request to correct the perl code for getting substrings

Don't use $&, it will slow down every regular expression in the program

You're quite right, Sir ;-)

#!/usr/bin/perl -w
use Benchmark;

my $string = "CCATGNNNTAACCNNATGNNTAGCC" x 10000;

timethese(10, {
              'matchamp' => sub {matchamp($string)},
              'matchpar' => sub {matchpar($string)},
});

sub matchamp {
  my $string = shift;
  my $matchlen = 0;
  while($string =~ /[AG]TG.*?[AG][AG]/g) {
    $matchlen += length($&); # Actually reference $& to keep Perl from
+ cheating
  }
  return $matchlen;
}

sub matchpar {
  my $string = shift;
  my $matchlen = 0;
  while ( $string =~ /([AG]TG.*?T[AG][AG])/g ) {
    $matchlen += length($1); # Reference $1, for fairness
  }
  return $matchlen;
}
[download]

yields:

Benchmark: timing 10 iterations of matchamp, matchpar...
  matchamp: 24 wallclock secs (23.48 usr +  0.00 sys = 23.48 CPU) @  0
+.43/s (n=10)
  matchpar: 26 wallclock secs (25.70 usr +  0.00 sys = 25.70 CPU) @  0
+.39/s (n=10)
[download]

The "$&" leg seems a tad faster, right?

BUT...

removing every trace of matchamp in the above program yields:

Benchmark: timing 10 iterations of matchpar...
  matchpar:  1 wallclock secs ( 0.54 usr +  0.00 sys =  0.54 CPU) @ 18
+.52/s (n=10)
[download]

That's a speedup by a factor ~26. I'm impressed.

Thanks for pointing that out

In Section Seekers of Perl Wisdom