Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

What can regular expressions NOT do?

by kryten (Scribe)
on Apr 23, 2000 at 02:16 UTC ( #8624=categorized question: print w/replies, xml ) Need Help??
Contributed by kryten on Apr 23, 2000 at 02:16 UTC
Q&A  > regular expressions


It seems like perl regexps can match pretty much anything, and can even be used for some mathematical trickery So what is it that they cannot do? Is there anything that is actually impossible to match?

Answer: What can regular expressions NOT do?
contributed by turnstep

Regexps are bad at very complicated cases, especially when things are nested. The typical example is HTML: it's easy enough to make a simple regexp to grab tags, but to get it really right and account for all cases (comments, greater than signs within quotes within HTML tags, multiline tags, etc.) you outgrow the usefullness of a regular expression and really need something that will actually parse the string (or file, whatever) that contains the HTML. Luckily, there are modules that will do just that for you.

Answer: What can regular expressions NOT do?
contributed by buzzcutbuddha

The question is not so much 'What Can't Regexp Do?', but when shouldn't you use it.
There are times when processing strings where it is easier and faster to use a substr
to grab what you need from input. This is especially true if you know that the length
of the input elements does not change (ie a fixed-width flat text file...). I guess the
analogy would be that if you wanted to dig postholes for a fence, you could use a shovel,
which is powerful and has many different ways to use, but is not as efficient as a posthole
digger. Don't get me wrong, I love regexps, but sometimes, they are not the best tool for
the job. :)

Answer: What can regular expressions NOT do?
contributed by kryten

It's just like the old adage:
If all you have is a regexp, then everything looks like a /^[Nn]ail$/

Answer: What can regular expressions NOT do?
contributed by jbn

you cannot write a regexp that will match any level of nested balanced parentheses. You can always write one to match a specific number of balanced ones, but not one that can match a non- fixed number of these. I think there is an explaination in the Friedl book.

Answer: What can regular expressions NOT do?
contributed by ton

Also cannot solve the "dangling else" problem. That is, if you're checking a piece of code with a bunch of "if" statements which may or may not be followed by "else" statements, it is impossible to write a regexp that will find all occurances of "elses" without matching "ifs" (e.g. the point at which the number of elses exceeds the number of ifs). Note that this is trivially easy to do with a loop.

There are a bunch more things also; have to dig up my compiler books to list them, though :)

Answer: What can regular expressions NOT do?
contributed by Anonymous Monk

It is because regular expressions merely have the power of finite state machines. These finite state machines have NO memory.... how could it match equal numbers of an indeterminate quantity without MEMORY. It can't. That's why you would need the power of a more expressive grammar ... that is that which can be expressed by a pushdown automota. With a pushdown, we have a "stack" of memory that we can save about each state. Clearly, you can see how the matched parenthesis problem becomes very simple with such a mechanism at hand.

Answer: What can regular expressions NOT do?
contributed by I0

But Perl's m// operator is a more expressive grammar than formal regular expressions.
For example, \1 allows you to detect primes:
print "Prime" if (1 x shift) !~ /^1?$|^(11+?)\1+$/
And s/// can be used to detect balanced parenthesis:

$_ = '(()(()()))'; (my $re=$_)=~s/((\()|(\))|.)/${[')','']}[!$3]\Q$1\E${['(','']}[!$2]/gs +; eval{/$re/}; print "Balanced" unless $@ =~ /unmatched/;

Please (register and) log in if you wish to add an answer

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and all is quiet...

    How do I use this? | Other CB clients
    Other Users?
    Others romping around the Monastery: (4)
    As of 2018-07-21 16:01 GMT
    Find Nodes?
      Voting Booth?
      It has been suggested to rename Perl 6 in order to boost its marketing potential. Which name would you prefer?

      Results (449 votes). Check out past polls.