Re: RegEx + vs. {1,}

by grizzley (Chaplain)
on Oct 10, 2012 at 12:01 UTC

in reply to RegEx + vs. {1,}

Because you are trying to match string which occurs twice and 'abcdefg' is first correct candidate. I can think only about such approach for your problem:
$ perl -le 'print $x = "abcdefgxxabcdefgzzabcdsjfhkdfab"; print $1 if +$x =~ /(\w{2,})(.*?\1){2}/;' abcdefgxxabcdefgzzabcdsjfhkdfab abcd $ perl -le 'print $x = "abcdefgxxabcdefgzzabcdsjfhkdfab"; print $1 if +$x =~ /(\w{2,})(.*?\1){3}/;' abcdefgxxabcdefgzzabcdsjfhkdfab ab $ perl -le 'print $x = "abcdefgxxabcdefgzzabcdsjfhkdfab"; print $1 if +$x =~ /(\w{2,})(.*?\1){4}/;' abcdefgxxabcdefgzzabcdsjfhkdfab
I.e. The problem is you have to say exactly how many occurences you want (there is no 'greediness' in this case == you can't say "I want as many occurences as possible", only lowest possible number of occurences will be chosen).

Re^2: RegEx + vs. {1,}
on Oct 10, 2012 at 13:01 UTC
      So if that's acceptable for you - use while loop to determine max amount of occurences. There will be no more than length / 2 occurences, so start with this max value and decrease it while trying to match:
      $x = "abcdefgxxabcdefgzzabcdsjfhkdfab"; $len=int(length($x)/2); while($x !~ /(\w{2,})(.*?\1){$len}/) { $len-- }; $x =~ /(\w{2,})(.*?\1){$len}/; # 'strange line' print $1
      (to self: do not know why I have to add 'strange line', without it nothing is printed, but $len is correctly set to 4)

      I tried to generate the list and include it in one regexp:

      $ perl -le '$x = "abcdefgxxabcdefgzzabcdsjfhkdfab"; $len=int(length($x +)/2); $restring = join"|", map {"(?:.*?\\1){$_}"} reverse(1..$len); p +rint $restring; print $1 if $x =~ /(\w{2,})($restring)/;' (?:.*?\1){15}|(?:.*?\1){14}|(?:.*?\1){13}|(?:.*?\1){12}|(?:.*?\1){11}| +(?:.*?\1){10}|(?:.*?\1){9}|(?:.*?\1){8}|(?:.*?\1){7}|(?:.*?\1){6}|(?: +.*?\1){5}|(?:.*?\1){4}|(?:.*?\1){3}|(?:.*?\1){2}|(?:.*?\1){1} abcdefg
      but it does not work as expected (probably some stupid mistake, maybe someone else can tell what's wrong with it).

Node Type: note
As of 2017-08-22 04:50 GMT
