http://www.perlmonks.org?node_id=89588


in reply to Unrolling the loop technique

Presumably, because they want to match the strings:

delimiter norma delimiter delimiter normallll special norma delimiter delimiter normall special normalllspecial norma delimiter delimiter normallllll special normal delimiter delimiter normal special normal delimiter

Without $1 being set to anything (that's what the "?:" after the opening bracket does).

Seriously, give us one or two of the actual regular expressions that are confusing you. The way you have it written, it's exactly equivalent to:

delimiter (?:special|normal)* delimiter

Assuming that each word is supposed to be a regex atom.

Update: Re-read the question :-} The reason why you'd want to do this would be if "special" is a pattern that matches the escaped delimiter and an escaped escape pattern, so that you can include the delimeters in the data.

Update: Someone just suggested that you'd want to unroll it for speed's sake... I recommend doing some speed profiling and seeing for yourself the kind of difference it makes... then deciding whether obfuscating your regular expressions is worth that speed increase. Hint: regular expressions are first compiled to an internal "deterministic acceptor", so it makes very little difference.

Replies are listed 'Best First'.
Re: Re: Unrolling the loop technique
by chipmunk (Parson) on Jun 19, 2001 at 18:40 UTC
    There may be much more than a little difference. Consider the output from the following program when run under perl5.005:
    #!perl -wl use strict; my $good = <<EOT; a quote " with some text after it and then another quote " EOT my $bad = <<EOT; a quote " with some text after it but without another quote EOT my $obvious = qr/"(?:\\.|[^"\\]+)*"/; my $unrolled = qr/"[^"\\]*(?:\\.[^"\\]*)*"/; $| = 1; print "\$good =~ \$unrolled"; print $good =~ $unrolled; print "\$good =~ \$obvious"; print $good =~ $obvious; print "\$bad =~ \$unrolled"; print $bad =~ $unrolled; print "\$bad =~ \$obvious"; print $bad =~ $obvious; print "done";
    The problem should become clear by the time it finishes. ;)

    Jeffrey Friedl goes into much more detail in Mastering Regular Expressions, which is where I grabbed the regexes I used above.

    However, if you run this program under perl5.6, you won't have as much time to figure out the issue, because there are improvements to the regex engine in that version which fix the problem!