Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

RE: Quantifiers in regular expressions

by Anonymous Monk
on Apr 23, 2000 at 01:35 UTC ( #8619=note: print w/ replies, xml ) Need Help??


in reply to Quantifiers in regular expressions

As a suggestion, this is something I had trouble with when I was learning regex. Non-greedy not only matches as little as possible, but it also *doesn't backtrack*; This is sort of an example that gave me trouble until I read about backtracking in the man page.

$string = "<foo>...</foo><bar>...</bar>";

# This matches <foo></bar>, not what we want.
$string =~ /\<.*\>(.*)\</.*\>/;
# this works, i think.
$string =~ /\<.*?\>(.*?)\</.*?\>/;

Even if it doesn't work, I hope you get the idea. I'd show the exact example of what I was doing when I ran into this, but it's overly-complicated (removing parts of tags from a string, where there's a list of tags and attributes for those tags that need to be removed).

Greediness relies a lot on backtracking, so to avoid frustrating another fledgling perl coder with the same problems, it's at least worth a note wherever greediness is in issue.


Comment on RE: Quantifiers in regular expressions
RE: RE: Quantifiers in regular expressions
by pkn (Initiate) on Jul 20, 2000 at 04:19 UTC
    I don't understand this: $string = "<foo>...</foo><bar>...</bar>"; # This matches <foo></bar>, not what we want. $string =~ /\<.*\>(.*)\</.*\>/; How does it match? And I thought that the matching operator was m//, not m///. What does m/// do?
      Yeah, AM escapes the wrong chars. Let's put it right. If you say:
      $string =~ /<.*>(.*)<\/.*>/;
      on a string "<foo>...</foo><bar>...</bar>" you get everything between "<foo>" and "</bar>", because the "*" modifier is "greedy", which means it tries to match as much as possible. A "." in a regex matches anything, so ".*" matches until the end of the string. Then the rest of the regex is evaluated, done by backtracking (the regex machine is now at the end of the string and goes back one by one until it finds a match). I hope this was correct.

      Update:
      Damn HTML escaping :) I fixed it, so the strings actually show up.
      This is maching tag delimiters and their belongond text. Let`s say you have text in H1 format <h1>This is my text</h1> and you want to replace both the heading and the text with only one step. Then you`ll need such a string as the mentioned above to find a matching pair.
Doubt in Quantifiers in regular expressions
by Anonymous Monk on Aug 16, 2002 at 11:26 UTC
    What is that u want to searh ? what is the pattern u are searching for?
Re: RE: Quantifiers in regular expressions
by ninja-joe (Monk) on May 19, 2004 at 19:59 UTC
    Here's a simple modification to the example code that will show what the regex matched when you typed it in:
    #!/usr/bin/perl while(<>) { chomp; # chomp so this next output is pretty. # newlines aren't discarded when you <> print "\"$1\" was matched out of \"$_\"" if m/(your_pattern)/; }
    Be sure to include the parenthesis around the entire regex that way it will save what it matches in $1.

    Do recall that while the regex operators are greedy by default you can suffix them with ? and they'll go to nongreedy. An example:
    #!/usr/bin/perl while(<>) { chomp; print "\"$1\" was matched out of \"$_\"\n" if m/(\w{5}?)/; }
    This will match "mywor" out of "myword"

    Have fun regexing.
      Putting a ? on a quantifier that isn't variable-length is like having any color car you want, as long as it's black. There's no difference between a greedy and a non-greedy {5}.

      A greedy quantifier will take as many characters as it can, and then start backtracking until whatever follows it matches. A non-greedy one will take as few characters as it can, and then take more until whatever is supposed to follow it is found.


      The PerlMonk tr/// Advocate

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://8619]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (9)
As of 2014-08-28 02:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (255 votes), past polls