Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

What can regular expressions NOT do?

( #8624=categorized question: print w/ replies, xml ) Need Help??
Contributed by kryten on Apr 23, 2000 at 02:16 UTC
Q&A  > regular expressions


Description:

It seems like perl regexps can match pretty much anything, and can even be used for some mathematical trickery So what is it that they cannot do? Is there anything that is actually impossible to match?

Answer: What can regular expressions NOT do?
contributed by turnstep

Regexps are bad at very complicated cases, especially when things are nested. The typical example is HTML: it's easy enough to make a simple regexp to grab tags, but to get it really right and account for all cases (comments, greater than signs within quotes within HTML tags, multiline tags, etc.) you outgrow the usefullness of a regular expression and really need something that will actually parse the string (or file, whatever) that contains the HTML. Luckily, there are modules that will do just that for you.

Answer: What can regular expressions NOT do?
contributed by buzzcutbuddha

The question is not so much 'What Can't Regexp Do?', but when shouldn't you use it.
There are times when processing strings where it is easier and faster to use a substr
to grab what you need from input. This is especially true if you know that the length
of the input elements does not change (ie a fixed-width flat text file...). I guess the
analogy would be that if you wanted to dig postholes for a fence, you could use a shovel,
which is powerful and has many different ways to use, but is not as efficient as a posthole
digger. Don't get me wrong, I love regexps, but sometimes, they are not the best tool for
the job. :)

Answer: What can regular expressions NOT do?
contributed by kryten

It's just like the old adage:
If all you have is a regexp, then everything looks like a /^[Nn]ail$/
:)

Answer: What can regular expressions NOT do?
contributed by jbn

you cannot write a regexp that will match any level of nested balanced parentheses. You can always write one to match a specific number of balanced ones, but not one that can match a non- fixed number of these. I think there is an explaination in the Friedl book.

Answer: What can regular expressions NOT do?
contributed by ton

Also cannot solve the "dangling else" problem. That is, if you're checking a piece of code with a bunch of "if" statements which may or may not be followed by "else" statements, it is impossible to write a regexp that will find all occurances of "elses" without matching "ifs" (e.g. the point at which the number of elses exceeds the number of ifs). Note that this is trivially easy to do with a loop.

There are a bunch more things also; have to dig up my compiler books to list them, though :)

Answer: What can regular expressions NOT do?
contributed by Anonymous Monk

It is because regular expressions merely have the power of finite state machines. These finite state machines have NO memory.... how could it match equal numbers of an indeterminate quantity without MEMORY. It can't. That's why you would need the power of a more expressive grammar ... that is that which can be expressed by a pushdown automota. With a pushdown, we have a "stack" of memory that we can save about each state. Clearly, you can see how the matched parenthesis problem becomes very simple with such a mechanism at hand.

Answer: What can regular expressions NOT do?
contributed by I0

But Perl's m// operator is a more expressive grammar than formal regular expressions.
For example, \1 allows you to detect primes:
print "Prime" if (1 x shift) !~ /^1?$|^(11+?)\1+$/
And s/// can be used to detect balanced parenthesis:

$_ = '(()(()()))'; (my $re=$_)=~s/((\()|(\))|.)/${[')','']}[!$3]\Q$1\E${['(','']}[!$2]/gs +; eval{/$re/}; print "Balanced" unless $@ =~ /unmatched/;

Please (register and) log in if you wish to add an answer



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others taking refuge in the Monastery: (8)
    As of 2014-08-21 18:22 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      The best computer themed movie is:











      Results (141 votes), past polls