Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: Common Regex Gotchas

by Anonymous Monk
on Nov 27, 2001 at 03:23 UTC ( #127652=note: print w/ replies, xml ) Need Help??


in reply to Common Regex Gotchas

I found the Greedy section to be quite confusing. First of all, I think you probably have a couple slash-s'es in your html that is causing it to not print and makes the last couple paragraphs very difficult to understand and read. Also, I'm still confused about why there is a match at all in the first example. Why doesn't the engine continue backwards past the whitespace and look for a </tag> string? Finally, why does the last example (still in the Greedy section) work? If, when creating the example string, I carraige return after the </tag>, there shouldn't be a whitespace to match on, right? Finally, finally, thanks for putting this together... it's really speeding my ramp along...


Comment on Re: Common Regex Gotchas
Re: Re: Common Regex Gotchas
by Anonymous Monk on Nov 27, 2001 at 03:36 UTC
    Sorry - re-reading it, I realized that I accidently inserted html tags that are not showing up.

    I found the Greedy section to be quite confusing. First of all, I think you probably have a couple slash-s'es in your html that is causing it to not print and makes the last couple paragraphs very difficult to understand and read.

    Also, I'm still confused about why there is a match at all in the first example. Why doesn't the engine continue backwards past the whitespace and look for a <\/tag> string?

    Finally, why does the last example (still in the Greedy section) work? If, when creating the example string, I carraige return after the <\/tag>, there shouldn't be a whitespace to match on, right?

    Finally, finally, thanks for putting this together... it's really speeding my ramp along...

      Why doesn't the engine continue backwards past the whitespace and look for a <\/tag> string?

      Because the engine prefers the longest match that starts at the leftmost possible position. When it hits .*, it jumps all the way to the end of the string and then backtracks, trying to match the next necessary character. Because it's backtracking, it matches </tag> at the end of the string. That fits the pattern, so it doesn't continue backtracking to find a shorter match.

      If, when creating the example string, I carraige return after the <\/tag>, there shouldn't be a whitespace to match on, right?

      The /s flag allows the '.' token to match newlines. Adding the minimal token '?' avoids the jump-to-end-then-backtrack behavior. It works like you'd expect, trying to match as few characters as possible.

      Does that clear it up? I've also touched up the formatting somewhat.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://127652]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (6)
As of 2014-08-28 05:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (257 votes), past polls