Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: RegEx + vs. {1,}

by ELISHEVA (Prior)
on Oct 10, 2012 at 14:16 UTC ( #998232=note: print w/ replies, xml ) Need Help??


in reply to RegEx + vs. {1,}

If you want a list of all two letter patterns that appear at least twice somewhere in your string, you need to make three changes to your regex.

  1. you need to make (\w{2,}) non-greedy by adding a "?" to the end, e.g. (\w{2,}?).
  2. you need to wrap what comes after (\w{2,}?) with a zero width lookahead group. Otherwise you will miss all the matches between the first and second occurrence of "ab"
  3. you need to handle repetitions of your regex slightly differently. Instead of /( mumblefoo )+/ you need /mumblefoo/g. Using a + the way you did will only get you the last match found because each time the + causes the regex to repeat, it replaces the previous match.

Taken together these changes will make your regex will look like this: /(\w{2,}?)(?=.*?\1)/g:

print $x = "abcdefgxxabcdefgzzabcdsjfhkdfab", "\n"; print "<" . join('|',$x =~ /(\w{2,}?)(?=.*?\1)/g) , ">\n"; #outputs: <ab|cd|ef|ab|cd|ab>

You can more info on zerolength lookaheads via the Extended Patterns section of the perlre manpage on perldoc


Comment on Re: RegEx + vs. {1,}
Select or Download Code
Re^2: RegEx + vs. {1,}
by choroba (Abbot) on Oct 10, 2012 at 14:25 UTC
    Or, just remove the comma from the quantifier: \w{2}
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      Indeed. There can never be a three character sequence in your string which occurs more frequently than a two character sequence. (Because the three character sequence contains two two character sequences.)

      perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'
        But what should be returned if we have string 'abcabcabcdef' with same amount of three-chars-long and two-chars-long string? 'ab' or 'abc'? I assume OP wants the longer one.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://998232]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (14)
As of 2014-08-27 11:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (237 votes), past polls