Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

Quantifiers in regular expressions

by root (Monk)
on Nov 10, 1999 at 05:58 UTC ( #967=perltutorial: print w/replies, xml ) Need Help??

Another useful thing would be able to match a certain number of characters that matched a certain criteria. There are two types of searches you can do when you're dealing with quantifiers: greedy or nongreedy, or in other terms maximal, and minimal. A maximal or greedy search tries to match as many characters as it can while still returning a true value. So if we were looking for 1 to 4 b's in a row and had a string with 3 b's in a row we would match the 3 b's. If it was a minimal or nongreedy search only the first b would be matched.

Here's a table that sums up quantifiers.

GreedyNongreedyAllowed numbers for a match
???0 or 1 time
++?1 or more times
**?0 or more times
{i}{i}?exactly i times
{i,}{i,}?i or more times
{i,j}{i,j}?at lease i times and at most j times

Obviously Perl needs to know what these quantifiers are referring to. The quantifier is associated with the character directly to its left unless parentheses are used for grouping.

/b{3}/ #matches three b's /(ha){3}/ #matches hahaha
Onto Character Class Abbreviations

Replies are listed 'Best First'.
RE: Quantifiers in regular expressions
by Anonymous Monk on Apr 23, 2000 at 01:35 UTC

    As a suggestion, this is something I had trouble with when I was learning regex. Non-greedy not only matches as little as possible, but it also *doesn't backtrack*; This is sort of an example that gave me trouble until I read about backtracking in the man page.

    $string = "<foo>...</foo><bar>...</bar>";
    # This matches <foo></bar>, not what we want.
    $string =~ /\<.*\>(.*)\</.*\>/;
    # this works, i think.
    $string =~ /\<.*?\>(.*?)\</.*?\>/;

    Even if it doesn't work, I hope you get the idea. I'd show the exact example of what I was doing when I ran into this, but it's overly-complicated (removing parts of tags from a string, where there's a list of tags and attributes for those tags that need to be removed).

    Greediness relies a lot on backtracking, so to avoid frustrating another fledgling perl coder with the same problems, it's at least worth a note wherever greediness is in issue.

      I don't understand this: $string = "<foo>...</foo><bar>...</bar>"; # This matches <foo></bar>, not what we want. $string =~ /\<.*\>(.*)\</.*\>/; How does it match? And I thought that the matching operator was m//, not m///. What does m/// do?
        Yeah, AM escapes the wrong chars. Let's put it right. If you say:
        $string =~ /<.*>(.*)<\/.*>/;
        on a string "<foo>...</foo><bar>...</bar>" you get everything between "<foo>" and "</bar>", because the "*" modifier is "greedy", which means it tries to match as much as possible. A "." in a regex matches anything, so ".*" matches until the end of the string. Then the rest of the regex is evaluated, done by backtracking (the regex machine is now at the end of the string and goes back one by one until it finds a match). I hope this was correct.

        Damn HTML escaping :) I fixed it, so the strings actually show up.
        This is maching tag delimiters and their belongond text. Let`s say you have text in H1 format <h1>This is my text</h1> and you want to replace both the heading and the text with only one step. Then you`ll need such a string as the mentioned above to find a matching pair.
      Here's a simple modification to the example code that will show what the regex matched when you typed it in:
      #!/usr/bin/perl while(<>) { chomp; # chomp so this next output is pretty. # newlines aren't discarded when you <> print "\"$1\" was matched out of \"$_\"" if m/(your_pattern)/; }
      Be sure to include the parenthesis around the entire regex that way it will save what it matches in $1.

      Do recall that while the regex operators are greedy by default you can suffix them with ? and they'll go to nongreedy. An example:
      #!/usr/bin/perl while(<>) { chomp; print "\"$1\" was matched out of \"$_\"\n" if m/(\w{5}?)/; }
      This will match "mywor" out of "myword"

      Have fun regexing.
        Putting a ? on a quantifier that isn't variable-length is like having any color car you want, as long as it's black. There's no difference between a greedy and a non-greedy {5}.

        A greedy quantifier will take as many characters as it can, and then start backtracking until whatever follows it matches. A non-greedy one will take as few characters as it can, and then take more until whatever is supposed to follow it is found.

        The PerlMonk tr/// Advocate
      What is that u want to searh ? what is the pattern u are searching for?
RE: Quantifiers in regular expressions
by misty (Hermit) on May 11, 2000 at 16:20 UTC
    This tutorial could have more examples, as it is not that easy to grasp what is going on without *seeing* it work.
      I agree with that. I am new to perl and it knocks me off
        here here, more examples

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perltutorial [id://967]
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (5)
As of 2023-11-28 21:51 GMT
Find Nodes?
    Voting Booth?

    No recent polls found