http://www.perlmonks.org?node_id=89788


in reply to Re: Unrolling the loop technique
in thread Unrolling the loop technique

In this case /*(.*?)*/ would work but it might be slow (I think, I have not benchmarked it).

I would strongly prefer m#/\*(.*?)\*/#s over the unrolled loop version if that is the whole regex (and I'd try to make that the whole regex precisely because I could then avoid using the unrolled regex).

The real problem with this simple technique comes when you try to use it as part of a larger regex. For example, let's say you want to extract "comment blocks", that is, a C-style comment that starts at the beginning of a line and ends at the end of a line. Using m#^/\*(.*?)\*/$#msg sure seems an easy way, and it even works for a lot of cases. However, consider this unlikely sample input:

/* This is correctly matched */ /* This: */ runcode(); /* gets included in the "comment" */
which would return this list:
( " This is correctly matched ", ' This: */ runcode(); /* gets included in the "comment" ' )
You see that .*? matches as little as possible but will prefer to match more if matching more will allow the entire regex to match (or to match earlier) when less causes the regex as a whole to fail (or to match later).

If I find myself wanting to use the loop unrolling technique, then I usually try to rework the problem by parsing in smaller chunks. Though, if these chunks start getting really small (like my parser starts having to deal with single characters in lots of cases), then I may use some of the simplest examples of unrolled regex loops.

        - tye (but my friends call me "Tye")