http://www.perlmonks.org?node_id=1022226

jaiieq has asked for the wisdom of the Perl Monks concerning the following question:

Say I have the following string: AAABCDAAADCBAAABBDAAA

I need to extract all instances of AAA(anything)AAA, so I used the following to try and do that:

my $string = 'AAABCDAAADCBAAABBDAAA'; my @matches = $string =~ /AAA\w+AAA/g;

The only result returned is the full string, whereas I need...:

AAABCDAAA AAADCBAAA AAABBDAAA AAABCDAAADCBAAA AAADCBAAABBDAAA AAABCDAAADCBAAABBDAAA
Any ideas?

Replies are listed 'Best First'.
Re: Perl Pattern Matching & RegEx's
by choroba (Cardinal) on Mar 07, 2013 at 14:29 UTC
    The main problem is matches in /g cannot overlap. This can be solved by using look-ahead, though:
    #!/usr/bin/perl use warnings; use strict; use feature qw(say); my $string = 'AAAbcdAAAdcbAAAbbdAAAxAAAA'; my $delimiter = 'AAA'; my @positions; push @positions, pos($string) while $string =~ /(?=$delimiter)/g; for my $from (@positions) { for my $to (grep $_ - length $delimiter > $from, @positions) { say substr($string, $from, $to - $from) . $delimiter; } }

    Update: Typo fixed. Thanks jaiieq, damn netbooks.

    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      This is exactly what I needed. Which also gives me the ability to easily change the delimiter to say 'AA' and produce the output I need. Thank you.

      There is a small typo in your code as you have a quote after $delimiter in the say line

      Wow, substr is so much better here than split/join!
Re: Perl Pattern Matching & RegEx's
by Dallaylaen (Chaplain) on Mar 07, 2013 at 14:16 UTC

    Why not split the string into chunks delimited by AAA, and then combine the chunks as you want and join them back?As in:

    #!/usr/bin/perl -w use strict; my $string = shift || 'AAABCDAAADCBAAABBDAAA'; my @between = split /AAA/, $string, -1; pop @between; shift @between; for (my $i = 0; $i<@between; $i++) { for (my $j = $i; $j<@between; $j++) { print join "AAA", "", @between[ $i .. $j ], "\n" }; };

    This won't solve the problem if your string contains AAAA, though.

    UPDATE: This substr-based solution is much better, it doesn't suffer from AAAA problem and probably uses less memory, too.
      This looks to be exactly what I was looking for. I am going to try it on a few other test cases and see how it works. Thank you!
Re: Perl Pattern Matching & RegEx's
by Athanasius (Archbishop) on Mar 07, 2013 at 14:28 UTC

    Here is a regex-based solution. As Anonymous Monk has pointed out, the non-greedy quantifier ? is an important component. But to get all the matches, you need to loop:

    #! perl use strict; use warnings; my %matches; my $s = 'zzAAABCDAAADCBAAABBDAAA'; my $t = $s =~ s/^[^A]*?(AAA.*)/$1/r; while ($t =~ /^AAA.+?AAA/) { my $u = $t; while ($u =~ /^(AAA.+?AAA)/) { my $match = $1; $match =~ s/\|/AAA/g; ++$matches{$match}; $u =~ s/(AAA.+?)AAA/$1\|/; } $t =~ s/^AAA.+?(AAA.*)/$1/; } print $_, "\n" for sort keys %matches;

    Output:

    0:15 >perl 563_SoPW.pl AAABBDAAA AAABCDAAA AAABCDAAADCBAAA AAABCDAAADCBAAABBDAAA AAADCBAAA AAADCBAAABBDAAA 0:23 >

    The inner loop finds successively longer matches by changing the AAA at the end of each match into a non-word character. The outer loop truncates the search string by removing everything up to the second AAA.

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: Perl Pattern Matching & RegEx's
by Anonymous Monk on Mar 07, 2013 at 14:00 UTC

    You want to use  +? as in \w+? , see perlfaq6, perlrequick

    The akward to use and outdated YAPE::Regex::Explain can help explain

    the problem (\w+)

    the solution (\w+?)