Perl Monk, Perl Meditation PerlMonks

### Re: RegEx + vs. {1,}

by grizzley (Chaplain)
 on Oct 10, 2012 at 12:01 UTC ( #998210=note: print w/replies, xml ) Need Help??

in reply to RegEx + vs. {1,}

Because you are trying to match string which occurs twice and 'abcdefg' is first correct candidate. I can think only about such approach for your problem:
```\$ perl -le 'print \$x = "abcdefgxxabcdefgzzabcdsjfhkdfab"; print \$1 if
+\$x =~ /(\w{2,})(.*?\1){2}/;'
abcdefgxxabcdefgzzabcdsjfhkdfab
abcd

\$ perl -le 'print \$x = "abcdefgxxabcdefgzzabcdsjfhkdfab"; print \$1 if
+\$x =~ /(\w{2,})(.*?\1){3}/;'
abcdefgxxabcdefgzzabcdsjfhkdfab
ab

\$ perl -le 'print \$x = "abcdefgxxabcdefgzzabcdsjfhkdfab"; print \$1 if
+\$x =~ /(\w{2,})(.*?\1){4}/;'
abcdefgxxabcdefgzzabcdsjfhkdfab
I.e. The problem is you have to say exactly how many occurences you want (there is no 'greediness' in this case == you can't say "I want as many occurences as possible", only lowest possible number of occurences will be chosen).

Replies are listed 'Best First'.
Re^2: RegEx + vs. {1,}
by Sewi (Friar) on Oct 10, 2012 at 13:01 UTC
So if that's acceptable for you - use while loop to determine max amount of occurences. There will be no more than length / 2 occurences, so start with this max value and decrease it while trying to match:
```\$x = "abcdefgxxabcdefgzzabcdsjfhkdfab"; \$len=int(length(\$x)/2);
while(\$x !~ /(\w{2,})(.*?\1){\$len}/)
{ \$len-- };
\$x =~ /(\w{2,})(.*?\1){\$len}/; # 'strange line'
print \$1
(to self: do not know why I have to add 'strange line', without it nothing is printed, but \$len is correctly set to 4)

I tried to generate the list and include it in one regexp:

```\$ perl -le '\$x = "abcdefgxxabcdefgzzabcdsjfhkdfab"; \$len=int(length(\$x
+)/2); \$restring = join"|", map {"(?:.*?\\1){\$_}"} reverse(1..\$len); p
+rint \$restring; print \$1 if \$x =~ /(\w{2,})(\$restring)/;'

(?:.*?\1){15}|(?:.*?\1){14}|(?:.*?\1){13}|(?:.*?\1){12}|(?:.*?\1){11}|
+(?:.*?\1){10}|(?:.*?\1){9}|(?:.*?\1){8}|(?:.*?\1){7}|(?:.*?\1){6}|(?:
+.*?\1){5}|(?:.*?\1){4}|(?:.*?\1){3}|(?:.*?\1){2}|(?:.*?\1){1}
abcdefg
but it does not work as expected (probably some stupid mistake, maybe someone else can tell what's wrong with it).

Create A New User
Node Status?
node history
Node Type: note [id://998210]
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (3)
As of 2018-05-20 20:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
World peace can best be achieved by:

Results (150 votes). Check out past polls.

Notices?