Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight

Delimited Backtracking with Regex

by neversaint (Deacon)
on Apr 21, 2006 at 01:52 UTC ( #544751=perlquestion: print w/ replies, xml ) Need Help??
neversaint has asked for the wisdom of the Perl Monks concerning the following question:

Dear Masters,

I have the following problem. Given a string:
I would like to extract all its substring that begins with "XXX" and end with "YYY". Such that it yields the following result (below is done manually):
However the following code of mine doesn't seem to do the job? What can I do to achieve that?
#!/usr/bin/perl -w use strict; use Data::Dumper; use Carp; my $str = "TXXXABCDGXXXCCCDTGYYYCCCYYYCC"; $str =~ m/(XXX.*?YYY)/g; print "$1\n";

neversaint and everlastingly indebted.......

Comment on Delimited Backtracking with Regex
Select or Download Code
Replies are listed 'Best First'.
Re: Delimited Backtracking with Regex
by ikegami (Pope) on Apr 21, 2006 at 02:06 UTC

    Take advantage of regexp backtracking to find all the possibilities.

    my $str = "TXXXABCDGXXXCCCDTGYYYCCCYYYCC"; local our @matches; $str =~ m/ (XXX.*YYY) # Search and capture (?{ push @matches, $1 }) # Save result (?!) # Try again /x; print "$_\n" foreach @matches;



    Update: Followed japhy's suggestion

      I've been asked why I used a package variable instead of a lexical variable. It's because regexps close around the lexicals that exist when they are first run.
      # pass 1 2 3 # --- --- --- sub test { my @matches; '' =~ / (?{ push @matches, 'a' }) (?{ print(scalar(@matches), "\n") }) # 1 2 3 /xg; print(scalar(@matches), "\n"); # 1 0 0 } test() for 1..3;

      A variable called @matches is created everytime test is called. The regexp always uses the variable from the first call.

        You can still do the same thing using lexicals (if you are allergic to using symbol-table variables). The only thing to be careful is to reuse the same push statement and same target array every time. This is adapted from the code in Re^3: Regexes: finding ALL matches (including overlap):
        { my @matches; my $push = qr/(?{ push @matches, $1 })/; sub match_all_ways { my ($string, $regex) = @_; @matches = (); $string =~ m/($regex)$push(?!)/; return @matches; } } print match_all_ways( "TXXXABCDGXXXCCCDTGYYYCCCYYYCC", qr/XXX.*YYY/ );
        Technically, $push is not needed -- you could just include (?{push @matches, $1}) in the m// statement inline. However, I like this way as it makes it a much more obvious that this part of the regex is only compiled once.


      Dear ikegami,
      To confirm my understanding on the use of "local our".
      You mean in this context right?
      foreach my $ss (@str_set) { local our @matches; $ss =~ m/ (XXX.*YYY) # Search and capture (?{ push @matches, $1 }) # Save result (?!) # Try again /x; print Dumper \@matches ; }
      Namely if I use "my" instead it will return:
      $VAR = [ # with something ] # and then empty.... $VAR = []; $VAR = []; etc...

      neversaint and everlastingly indebted.......
Re: Delimited Backtracking with Regex
by McDarren (Abbot) on Apr 21, 2006 at 01:59 UTC
    I'm not sure if I understand this correctly - but I suspect that all you really need to do is split. eg:

    Update: - looks like I obviously didn't even remotely understand what you were after, so just pretend I didn't post the below :)

    #!/usr/bin/perl -w use strict; use Data::Dumper::Simple; my $string = "TXXXABCDGXXXCCCDTGYYYCCCYYYCC"; my @wanted = split /XXX|YYY/, $string; print Dumper(@wanted);
    @wanted = ( 'T', 'ABCDG', 'CCCDTG', 'CCC', 'CC' );
    Is that what you were after?

    Darren :)

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://544751]
Approved by GrandFather
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2015-10-04 16:12 GMT
Find Nodes?
    Voting Booth?

    Does Humor Belong in Programming?

    Results (103 votes), past polls