<?xml version="1.0" encoding="windows-1252"?>
<node id="998239" title="Re^3: RegEx + vs. {1,}" created="2012-10-10 11:11:15" updated="2012-10-10 11:11:15">
<type id="11">
note</type>
<author id="645387">
grizzley</author>
<data>
<field name="doctext">
So if that's acceptable for you - use while loop to determine max amount of occurences. There will be no more than length / 2 occurences, so start with this max value and decrease it while trying to match:
&lt;code&gt;
$x = "abcdefgxxabcdefgzzabcdsjfhkdfab"; $len=int(length($x)/2);
while($x !~ /(\w{2,})(.*?\1){$len}/)
  { $len-- };
$x =~ /(\w{2,})(.*?\1){$len}/; # 'strange line'
print $1
&lt;/code&gt;
(to self: do not know why I have to add 'strange line', without it nothing is printed, but $len is correctly set to 4)
&lt;p&gt;I tried to generate the list and include it in one regexp:
&lt;code&gt;
$ perl -le '$x = "abcdefgxxabcdefgzzabcdsjfhkdfab"; $len=int(length($x)/2); $restring = join"|", map {"(?:.*?\\1){$_}"} reverse(1..$len); print $restring; print $1 if $x =~ /(\w{2,})($restring)/;'

(?:.*?\1){15}|(?:.*?\1){14}|(?:.*?\1){13}|(?:.*?\1){12}|(?:.*?\1){11}|(?:.*?\1){10}|(?:.*?\1){9}|(?:.*?\1){8}|(?:.*?\1){7}|(?:.*?\1){6}|(?:.*?\1){5}|(?:.*?\1){4}|(?:.*?\1){3}|(?:.*?\1){2}|(?:.*?\1){1}
abcdefg
&lt;/code&gt;
but it does not work as expected (probably some stupid mistake, maybe someone else can tell what's wrong with it).</field>
<field name="root_node">
998200</field>
<field name="parent_node">
998223</field>
</data>
</node>
