http://www.perlmonks.org?node_id=544751

neversaint has asked for the wisdom of the Perl Monks concerning the following question:

Dear Masters,

I have the following problem. Given a string:
TXXXABCDGXXXCCCDTGYYYCCCYYYCC
I would like to extract all its substring that begins with "XXX" and end with "YYY". Such that it yields the following result (below is done manually):
XXXABCDGXXXCCCDTGYYYCCCYYY XXXABCDGXXXCCCDTGYYY XXXCCCDTGYYYCCCYYY XXXCCCDTGYYY
However the following code of mine doesn't seem to do the job? What can I do to achieve that?
#!/usr/bin/perl -w use strict; use Data::Dumper; use Carp; my $str = "TXXXABCDGXXXCCCDTGYYYCCCYYYCC"; $str =~ m/(XXX.*?YYY)/g; print "$1\n";


---
neversaint and everlastingly indebted.......

Replies are listed 'Best First'.
Re: Delimited Backtracking with Regex
by ikegami (Patriarch) on Apr 21, 2006 at 02:06 UTC

    Take advantage of regexp backtracking to find all the possibilities.

    my $str = "TXXXABCDGXXXCCCDTGYYYCCCYYYCC"; local our @matches; $str =~ m/ (XXX.*YYY) # Search and capture (?{ push @matches, $1 }) # Save result (?!) # Try again /x; print "$_\n" foreach @matches;

    outputs

    XXXABCDGXXXCCCDTGYYYCCCYYY XXXABCDGXXXCCCDTGYYY XXXCCCDTGYYYCCCYYY XXXCCCDTGYYY

    Update: Followed japhy's suggestion

      I've been asked why I used a package variable instead of a lexical variable. It's because regexps close around the lexicals that exist when they are first run.
      # pass 1 2 3 # --- --- --- sub test { my @matches; '' =~ / (?{ push @matches, 'a' }) (?{ print(scalar(@matches), "\n") }) # 1 2 3 /xg; print(scalar(@matches), "\n"); # 1 0 0 } test() for 1..3;

      A variable called @matches is created everytime test is called. The regexp always uses the variable from the first call.

        You can still do the same thing using lexicals (if you are allergic to using symbol-table variables). The only thing to be careful is to reuse the same push statement and same target array every time. This is adapted from the code in Re^3: Regexes: finding ALL matches (including overlap):
        { my @matches; my $push = qr/(?{ push @matches, $1 })/; sub match_all_ways { my ($string, $regex) = @_; @matches = (); $string =~ m/($regex)$push(?!)/; return @matches; } } print match_all_ways( "TXXXABCDGXXXCCCDTGYYYCCCYYYCC", qr/XXX.*YYY/ );
        Technically, $push is not needed -- you could just include (?{push @matches, $1}) in the m// statement inline. However, I like this way as it makes it a much more obvious that this part of the regex is only compiled once.

        blokhead

      Dear ikegami,
      To confirm my understanding on the use of "local our".
      You mean in this context right?
      foreach my $ss (@str_set) { local our @matches; $ss =~ m/ (XXX.*YYY) # Search and capture (?{ push @matches, $1 }) # Save result (?!) # Try again /x; print Dumper \@matches ; }
      Namely if I use "my" instead it will return:
      $VAR = [ # with something ] # and then empty.... $VAR = []; $VAR = []; etc...


      ---
      neversaint and everlastingly indebted.......
Re: Delimited Backtracking with Regex
by McDarren (Abbot) on Apr 21, 2006 at 01:59 UTC
    I'm not sure if I understand this correctly - but I suspect that all you really need to do is split. eg:

    Update: - looks like I obviously didn't even remotely understand what you were after, so just pretend I didn't post the below :)

    #!/usr/bin/perl -w use strict; use Data::Dumper::Simple; my $string = "TXXXABCDGXXXCCCDTGYYYCCCYYYCC"; my @wanted = split /XXX|YYY/, $string; print Dumper(@wanted);
    Gives:
    @wanted = ( 'T', 'ABCDG', 'CCCDTG', 'CCC', 'CC' );
    Is that what you were after?

    Cheers,
    Darren :)