Beefy Boxes and Bandwidth Generously Provided by pair Networks Joe
Perl: the Markov chain saw
 
PerlMonks  

Useful Uses of regex code expressions?

by kvale (Monsignor)
on Jun 26, 2002 at 05:44 UTC ( #177306=perlmeditation: print w/ replies, xml ) Need Help??

Back when I wrote perlretut , code expressions (?{code}) and (??{code}) were relatively new beasts in the regex menagerie. I thought they were clever and the variable backtracking was a little bit magical.

The trouble was, I couldn't come up with any examples where code expressions really nailed the problem. So for the tutorial I created a cute, but useless, example of matching Fibonacci strings:
# detects if a binary string 1101010010001... # has a Fibonacci spacing 0,1,1,2,3,5,... of the 1's $x = "1101010010001000001"; $s0 = 0; $s1 = 1; # initial conditions print "It is a Fibonacci sequence\n" if $x =~ /^1 # match an initial '1' ( (??{'0' x $s0}) # match $s0 of '0' 1 # and then a '1' (?{ $largest = $s0; # largest seq so far $s2 = $s1 + $s0; # compute next term $s0 = $s1; # in Fibonacci sequence $s1 = $s2; }) )+ # repeat as needed $ # that is all there is /x; print "Largest sequence matched was $largest\n";
This prints
It is a Fibonacci sequence Largest sequence matched was 5

Since then, I have written many regexes, but haven't come across any code that needed code expressions, as opposed to a combination of simpler regexes and perl code.

My question is: Has anyone found a good use for code expressions in their regex work?

-Mark

Comment on Useful Uses of regex code expressions?
Select or Download Code
Re: Useful Uses of regex code expressions?
by Anonymous Monk on Jun 26, 2002 at 06:37 UTC
    Well, dealing with nested parens is one example (already in perlre) that you might expand on.
Re: Useful Uses of regex code expressions?
by jryan (Vicar) on Jun 26, 2002 at 23:22 UTC

    The reason (?{}) and (??{}) aren't used to often is that they are extremely powerful, and most data parsing simply doesn't need that kind of power (or, when it does, most people will turn to Parse::RecDescent). However, I've found the code expression abilty to be quite helpful in employing a few slick tricks, such as:

    1. Conditional Insert: (??{(cond) ? yes : no})

      I'm not talking about (?(cond) yes|no), of course; (?(cond) yes|no) will only take a backreference or a zero width assertion as its condition. (??{(cond) ? yes : no}) will similary cause the appropriate pattern to be inserted based on the evaluation of the condition; however, the condition is no longer limited. This is useful in such cases where you don't want to fail if are using the (?(cond)yes) of (?cond), or are mucking with much weirder other things... (see my homenode for an example)

    2. Partial Backreferences: (??{substr($string_that_is_matched_against,$+[$#+]-1,1)})

      The snippet above will insert the last character was matched when the regex hits that point. Above is a general case, useful if you are playing with recursion; if you know $DIGIT, you can always do:

      (??{substr($DIGIT,-1)}

    3. Dynamically Building Character Classes: (??{'[\Q$letters\E]*'})

      Variables normally can't be interpolated within character classes. With (??{}), now you can.

    4. Matching/extracting arbitrarily nested elements with recursion: (??{$compiled_regex})

      Like the anonymous monk already said, (??{}) is a necessity in matching nested sets of paren/brackets/etc if you don't want to go crawling through the string C-style. I go into more detail using a less basic example than the one in perlre here.

    I have other "interesting" regex on my homenode, and many involve our friends (??{}) and (?{}). Take a look if you don't feel like sleeping well tonight.

      (??{}) is handy in matching nested sets of paren/brackets/etc without crawling through the string C-style, but not a necessity.
        True; but rolling with your flow, regexes themselves aren't really a necessity - only something that happens to be handy for when when you don't feel like crawling through a string C-style.
      Very nice! I especially like the partial backreferences.

      It is possible to dynamically build character classes the ordinary way:
      foreach my $char (a..d) { print "bbb matches $char\n" if "bbb" =~ /[$char]/; }
      prints
      bbb matches b

      Thanks for expanding my horizons!

      -Mark
      A few corrections:

      1. The (?(...)true|false) can also take a (?{ code }) block as its condition.

      3. Uh, /[\Q$foo\E]/ works for me. The problem comes when you want to employ variables that are created as the regex matches (like the digit variables). Then it comes in handy. For example, /.(??{ "[^\Q$&\E]" })*/s finds a string of characters with no repeats.

      You can see examples of code assertions in my book, whenever the hell I finish it.

      _____________________________________________________
      Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
      s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

        Regarding 1: That most certainly is an undocumented feature. According go perlre:

        (?(condition)yes-pattern|no-pattern) (?(condition)yes-pattern) Conditional expression. (condition) should be either an integer in par +entheses (which is valid if the corresponding pair of parentheses mat +ched), or look-ahead/look-behind/evaluate zero-width assertion.

        Update: I saw, and read that phrase; however, my brain hates me and skipped processing it correctly ;(

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://177306]
Approved by FoxtrotUniform
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (11)
As of 2014-04-19 15:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (482 votes), past polls