Contributed by kryten
on Apr 23, 2000 at 02:16 UTC
Q&A
> regular expressions
Description: It seems like perl regexps can match
pretty much anything, and can even be used
for some mathematical trickery
So what is it that they cannot do?
Is there anything that is actually impossible
to match? Answer: What can regular expressions NOT do? contributed by turnstep Regexps are bad at very complicated cases, especially when
things are nested. The typical example is HTML: it's easy enough
to make a simple regexp to grab tags, but to get it really right
and account for all cases (comments, greater than signs within quotes
within HTML tags, multiline tags, etc.) you outgrow the usefullness
of a regular expression and really need something that will
actually parse the string (or file, whatever) that contains
the HTML. Luckily, there are modules that will do just that
for you.
| Answer: What can regular expressions NOT do? contributed by buzzcutbuddha The question is not so much 'What Can't Regexp Do?', but when shouldn't you use it.
There are times when processing strings where it is easier and faster to use a substr
to grab what you need from input. This is especially true if you know that the length
of the input elements does not change (ie a fixed-width flat text file...). I guess the
analogy would be that if you wanted to dig postholes for a fence, you could use a shovel,
which is powerful and has many different ways to use, but is not as efficient as a posthole
digger. Don't get me wrong, I love regexps, but sometimes, they are not the best tool for
the job. :) | Answer: What can regular expressions NOT do? contributed by kryten It's just like the old adage:
If all you have is a regexp, then everything
looks like a /^[Nn]ail$/
:)
| Answer: What can regular expressions NOT do? contributed by jbn you cannot write a regexp that will match any
level of nested balanced parentheses. You can
always write one to match a specific number of
balanced ones, but not one that can match a non-
fixed number of these. I think there is an
explaination in the Friedl book. | Answer: What can regular expressions NOT do? contributed by ton Also cannot solve the "dangling else" problem. That is, if you're checking a piece of code with a bunch of "if" statements which may or may not be followed by "else" statements, it is impossible to write a regexp that will find all occurances of "elses" without matching "ifs" (e.g. the point at which the number of elses exceeds the number of ifs). Note that this is trivially easy to do with a loop.
There are a bunch more things also; have to dig up my compiler books to list them, though :) | Answer: What can regular expressions NOT do? contributed by Anonymous Monk It is because regular expressions merely have the power of finite state machines. These finite state machines have NO memory.... how could it match equal numbers of an indeterminate quantity without MEMORY. It can't.
That's why you would need the power of a more expressive grammar ... that is that which can be expressed by a pushdown automota. With a pushdown, we have a "stack" of memory that we can save about each state. Clearly, you can see how the matched parenthesis problem becomes very simple with such a mechanism at hand. | Answer: What can regular expressions NOT do? contributed by I0 But Perl's m// operator is a more expressive grammar than formal regular expressions.
For example, \1 allows you to detect primes:
print "Prime" if (1 x shift) !~ /^1?$|^(11+?)\1+$/
And s/// can be used to detect balanced parenthesis:
$_ = '(()(()()))';
(my $re=$_)=~s/((\()|(\))|.)/${[')','']}[!$3]\Q$1\E${['(','']}[!$2]/gs
+;
eval{/$re/};
print "Balanced" unless $@ =~ /unmatched/;
|
Please (register and) log in if you wish to add an answer
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
|
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.
|
|