Let me see if I can make this make sense with digging out a text book...
Regular expressions are implemented internally by a 'finite state automata'.
If you've never heard the term, I'll attempt to explain it...
Picture a group of circles interconnected by various lines. The circles represent
the current state, and the lines represent the next state to go to if a give input
is seen. One circle is the start state, and some other number of circles are 'end'
states (you can have more than one).
For a specific example try this:
take three circles, label them 'start', '1', and 'end'
draw an arrow from 'start' to '1', from '1' to 'end' and from 'end' to '1'
label the arrows 'a','b' and 'c' respectively.
Starting at 'start', take each character of input and follow the link with that
label to the next state.
If at the end of the input you're at the 'end' state, the this automata matches
the input.
If you have a character of input that you don't have a link for from you're current
state, or you run out of input and aren't on a 'end' state, the the automata doesn't match the input.
Using our example the following will match: 'abc','abcbc','abcbcbc'.
And these will not: 'a', 'ab','abcb','abcd','ad', etc.
(The regular expression for this automata would be /^a(bc)+$/)
At each step the only thing the automata is concerned with is what state it is in,
and what is the next character of input. The is no retained knowledge of what the
previous characters were. Since finite state automatas have no 'memory' of what
input they've seen before, they have no way of knowing if the correct number of ')'
has been found.
Hopefully that made sense, but was probably FMTYWTK
/\/\averick
| [reply] |
$string = "((()()())"; # one unbalanced paren
($re = "\Q$string") =~ s/\\(\()|\\(\))/$1\\$1$2$2/g;
my @a = eval { $string =~ $re };
die "Mismatched brackets in '$string'\n" if $@;
-J.
| [reply] [d/l] |