<?xml version="1.0" encoding="windows-1252"?>
<node id="998232" title="Re: RegEx + vs. {1,}" created="2012-10-10 10:16:46" updated="2012-10-10 10:16:46">
<type id="11">
note</type>
<author id="720219">
ELISHEVA</author>
<data>
<field name="doctext">
&lt;p&gt;If you want a list of all two letter patterns that appear at least twice somewhere in your string, you need to make three changes to your regex.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;you need to make &lt;c&gt;(\w{2,})&lt;/c&gt; non-greedy by adding a "?" to the end, e.g. &lt;c&gt;(\w{2,}?)&lt;/c&gt;.&lt;/li&gt;
&lt;li&gt;you need to wrap what comes after &lt;c&gt;(\w{2,}?)&lt;/c&gt; with a zero width lookahead group.  Otherwise you will miss all the matches between the first and second occurrence of "ab"&lt;/li&gt;
&lt;li&gt;you need to handle repetitions of your regex slightly differently.  Instead of &lt;c&gt;/( mumblefoo )+/&lt;/c&gt; you need &lt;c&gt;/mumblefoo/g&lt;/c&gt;.  Using a + the way you did will only get you the last match found because each time the + causes the regex to repeat, it replaces the previous match.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Taken together these changes will make your regex will look like this: &lt;c&gt;/(\w{2,}?)(?=.*?\1)/g&lt;/c&gt;:&lt;/p&gt;

&lt;code&gt;
print $x = "abcdefgxxabcdefgzzabcdsjfhkdfab", "\n";
print  "&lt;" . join('|',$x =~ /(\w{2,}?)(?=.*?\1)/g) , "&gt;\n";

#outputs: &lt;ab|cd|ef|ab|cd|ab&gt;
&lt;/code&gt;

&lt;p&gt;You can more info on zerolength lookaheads via [http://perldoc.perl.org/perlre.html#Extended-Patterns|the Extended Patterns section of the perlre manpage on perldoc]&lt;/p&gt;</field>
<field name="root_node">
998200</field>
<field name="parent_node">
998200</field>
</data>
</node>
