http://www.perlmonks.org?node_id=298331

almaric has asked for the wisdom of the Perl Monks concerning the following question:

Hello,

just a quick question. Short scan over a few regexp tutorials didn't lead me to an answer.

$a="aaaa"; @a=$a=~m/aa/g; print join ( "-", @a );
outputs:
aa-aa
But with another pov one could find 3 occurences of 'aa' in 'aaaa' but I think this is a question of how matching is defined in perl.

Is there a way to manipulate the regexp above to match 3 times in the given string or do I have to write my own sub for this task?

Any suggestion to do this the "perl way"?



Thanks!

PS: A friend came up with the following, non-experimental :) suggestion

$regexp="a{2}"; $a="aaaa"; @a=$a=~m/(?=($regexp))./g; print join ( "-", @a );
Greetings

Replies are listed 'Best First'.
Re: multiple matches with regexp
by thelenm (Vicar) on Oct 10, 2003 at 21:20 UTC

    You can do this by doing a global match and resetting pos on each iteration to the position after the match, like this:

    $_ = "aaaaa"; while (/aa/g) { print "Match at $-[0]\n"; pos = $-[0] + 1; }

    -- Mike

    --
    XML::Simpler does not require XML::Parser or a SAX parser. It does require File::Slurp.
    -- grantm, perldoc XML::Simpler

Re: multiple matches with regexp
by Aristotle (Chancellor) on Oct 10, 2003 at 17:42 UTC

    This only matches twice because each match continues were the previous left off. The first match consumes two a's, so you can only match once more.

    Many solutions are possible, but without a better idea of your requirements, most of the propositions won't fit. Do you intend to match fixed strings or are you planning to use true patterns? What does your data look like? What kinds of overlap between matches are possible? There is no universally applicable answer to your problem.

    Makeshifts last the longest.

Re: multiple matches with regexp
by CombatSquirrel (Hermit) on Oct 10, 2003 at 20:23 UTC
    Well, the Perl way is obviously TIMTOWDI, but I have a Perl-ish RegEx way for you ;-):
    $a="aaaa"; $a=~m/(aa)(?{push @a, $1})(?!)/; print join ( "-", @a );
    This uses (?{}) (just a bit of code within a RegEx that is executed whenever the RE engine runs over it) and (?!) (negative look-ahead), so that it always fails (that's a bit of its magic), both explained in perlre. You could say it is ugly, but I personally like it :-).
    Hope this helped.
    CombatSquirrel.
    Entropy is the tendency of everything going to hell.
      This is clever, and extended my understanding of the RE engine (++), but is it guaranteed to work?

      I got interested in why a negative look-ahead was required, and found that negative and positive failing look-behinds work too, but a simple mis-match doesn't, and neither does a failing zero-length positive look-ahead: (?=x). For example, m/(aa)(?{push @a, $1})x/ does not work. Presumably the regex optimiser sees that there is no 'x' in 'aaaa', so it doesn't bother with the step-wise attempts to match the 'a's.

      Is it possible a future regex engine will realise that mis-match is inevitable because (?!) will always mis-match, and break this code?

        A too smart RegEx engine would already break the (?{}) part of the code, which is evaluated every time the engine runs over it. The main problem is that (?{}) is an experimental feature which may be changed or deleted in future Perl versions. Still, AFAIK, it is considered useful for some RegExes (the above one is fairly standard) which will hopefully prevent major changes in the syntax. And don't forget we have Perl 6 coming up ;-).
        Cheers,
        CombatSquirrel.
        Entropy is the tendency of everything going to hell.
      I like it, and with
      use re 'eval';
      I was also able to use a regular expression instead of a fixed string: "aa" -> "a{2}"
Re: multiple matches with regexp
by davido (Cardinal) on Oct 10, 2003 at 20:00 UTC
    You could do it this way. I'm using a manual position rather than utilizing the default characteristics of /g. I advance the position by one character each time through the loop. And if I get a match, I advance it by one plus the offset of the beginning of that match. It probably could be golfed down a lot (for instance, the use of $sstr is unnecessary; substr can be bound directly to the regexp. And there is probably not really a need for the "else" condition in the loop, but I just wanted to cover all bases and make it as clear as possible. Here it is...
    use strict; use warnings; my $string = "aabbaaaabbaaabbbbabaabbbaaaa"; my @a; my $position = 0; while ( $position < length $string ) { my $sstr = substr($string,$position); if ( $sstr =~ /(aa)/ ) { push @a, $1; $position+=$-[0]+1; } else { $position++; } } print join("-", @a), "\n";

    Dave


    "If I had my life to do over again, I'd be a plumber." -- Albert Einstein
      I stripped this for me down to
      $regexp="a{2}"; $_="aaaa"; do { push @a, $1 if ( m/^($regexp)/ ) } while ( s/^.// ) ; print join ( "-", @a );
      The main difference is, that I match the pattern at the beginning of the string while you match it on the substr

      Thanks for the input.