Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine

Answer: What can regular expressions NOT do?

by jbn (Initiate)
on Apr 27, 2000 at 01:48 UTC ( #9335=categorized answer: print w/replies, xml ) Need Help??

Q&A > regular expressions > What can regular expressions NOT do? - Answer contributed by jbn

you cannot write a regexp that will match any level of nested balanced parentheses. You can always write one to match a specific number of balanced ones, but not one that can match a non- fixed number of these. I think there is an explaination in the Friedl book.

  • Comment on Answer: What can regular expressions NOT do?
Replies are listed 'Best First'.
RE: Answer: What can regular expressions NOT do?
by maverick (Curate) on Jun 27, 2000 at 19:40 UTC
    Let me see if I can make this make sense with digging out a text book...

    Regular expressions are implemented internally by a 'finite state automata'.

    If you've never heard the term, I'll attempt to explain it... Picture a group of circles interconnected by various lines. The circles represent the current state, and the lines represent the next state to go to if a give input is seen. One circle is the start state, and some other number of circles are 'end' states (you can have more than one).

    For a specific example try this:
    take three circles, label them 'start', '1', and 'end'
    draw an arrow from 'start' to '1', from '1' to 'end' and from 'end' to '1'
    label the arrows 'a','b' and 'c' respectively.

    Starting at 'start', take each character of input and follow the link with that label to the next state.
    If at the end of the input you're at the 'end' state, the this automata matches the input.
    If you have a character of input that you don't have a link for from you're current state, or you run out of input and aren't on a 'end' state, the the automata doesn't match the input.
    Using our example the following will match: 'abc','abcbc','abcbcbc'.
    And these will not: 'a', 'ab','abcb','abcd','ad', etc.
    (The regular expression for this automata would be /^a(bc)+$/)

    At each step the only thing the automata is concerned with is what state it is in, and what is the next character of input. The is no retained knowledge of what the previous characters were. Since finite state automatas have no 'memory' of what input they've seen before, they have no way of knowing if the correct number of ')' has been found.

    Hopefully that made sense, but was probably FMTYWTK

      Well, I actually asked for something similar ages ago on comp.lang.perl.moderated (I wanted to match balanced braces in C++ code) and was treated to the following solution which works quite handily on arbitrary-depth nesting for parens.

      $string = "((()()())"; # one unbalanced paren ($re = "\Q$string") =~ s/\\(\()|\\(\))/$1\\$1$2$2/g; my @a = eval { $string =~ $re }; die "Mismatched brackets in '$string'\n" if $@;


Log In?

What's my password?
Create A New User
and God said, "Let Newton be!"...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (9)
As of 2017-06-26 17:44 GMT
Find Nodes?
    Voting Booth?
    How many monitors do you use while coding?

    Results (583 votes). Check out past polls.