Re: What can regular expressions NOT do?

Replies are listed 'Best First'.
RE: Answer: What can regular expressions NOT do? by maverick (Curate) on Jun 27, 2000 at 19:40 UTC
Let me see if I can make this make sense with digging out a text book... Regular expressions are implemented internally by a 'finite state automata'. If you've never heard the term, I'll attempt to explain it... Picture a group of circles interconnected by various lines. The circles represent the current state, and the lines represent the next state to go to if a give input is seen. One circle is the start state, and some other number of circles are 'end' states (you can have more than one). For a specific example try this: take three circles, label them 'start', '1', and 'end' draw an arrow from 'start' to '1', from '1' to 'end' and from 'end' to '1' label the arrows 'a','b' and 'c' respectively. Starting at 'start', take each character of input and follow the link with that label to the next state. If at the end of the input you're at the 'end' state, the this automata matches the input. If you have a character of input that you don't have a link for from you're current state, or you run out of input and aren't on a 'end' state, the the automata doesn't match the input. Using our example the following will match: 'abc','abcbc','abcbcbc'. And these will not: 'a', 'ab','abcb','abcd','ad', etc. (The regular expression for this automata would be /^a(bc)+$/) At each step the only thing the automata is concerned with is what state it is in, and what is the next character of input. The is no retained knowledge of what the previous characters were. Since finite state automatas have no 'memory' of what input they've seen before, they have no way of knowing if the correct number of ')' has been found. Hopefully that made sense, but was probably FMTYWTK /\/\averick	[reply]
Re: RE: Answer: What can regular expressions NOT do? by Joey The Saint (Novice) on Mar 20, 2001 at 20:48 UTC
Well, I actually asked for something similar ages ago on comp.lang.perl.moderated (I wanted to match balanced braces in C++ code) and was treated to the following solution which works quite handily on arbitrary-depth nesting for parens. `$string = "((()()())"; # one unbalanced paren ($re = "\Q$string") =~ s/\$\()\|\\($)/$1\\$1$2$2/g; my @a = eval { $string =~ $re }; die "Mismatched brackets in '$string'\n" if $@;` [download] -J.	[reply] [d/l]