Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Delimited Backtracking with Regex

by neversaint (Deacon)
on Apr 21, 2006 at 01:52 UTC ( #544751=perlquestion: print w/ replies, xml ) Need Help??
neversaint has asked for the wisdom of the Perl Monks concerning the following question:

Dear Masters,

I have the following problem. Given a string:
TXXXABCDGXXXCCCDTGYYYCCCYYYCC
I would like to extract all its substring that begins with "XXX" and end with "YYY". Such that it yields the following result (below is done manually):
XXXABCDGXXXCCCDTGYYYCCCYYY XXXABCDGXXXCCCDTGYYY XXXCCCDTGYYYCCCYYY XXXCCCDTGYYY
However the following code of mine doesn't seem to do the job? What can I do to achieve that?
#!/usr/bin/perl -w use strict; use Data::Dumper; use Carp; my $str = "TXXXABCDGXXXCCCDTGYYYCCCYYYCC"; $str =~ m/(XXX.*?YYY)/g; print "$1\n";


---
neversaint and everlastingly indebted.......

Comment on Delimited Backtracking with Regex
Select or Download Code
Re: Delimited Backtracking with Regex
by McDarren (Abbot) on Apr 21, 2006 at 01:59 UTC
    I'm not sure if I understand this correctly - but I suspect that all you really need to do is split. eg:

    Update: - looks like I obviously didn't even remotely understand what you were after, so just pretend I didn't post the below :)

    #!/usr/bin/perl -w use strict; use Data::Dumper::Simple; my $string = "TXXXABCDGXXXCCCDTGYYYCCCYYYCC"; my @wanted = split /XXX|YYY/, $string; print Dumper(@wanted);
    Gives:
    @wanted = ( 'T', 'ABCDG', 'CCCDTG', 'CCC', 'CC' );
    Is that what you were after?

    Cheers,
    Darren :)

Re: Delimited Backtracking with Regex
by ikegami (Pope) on Apr 21, 2006 at 02:06 UTC

    Take advantage of regexp backtracking to find all the possibilities.

    my $str = "TXXXABCDGXXXCCCDTGYYYCCCYYYCC"; local our @matches; $str =~ m/ (XXX.*YYY) # Search and capture (?{ push @matches, $1 }) # Save result (?!) # Try again /x; print "$_\n" foreach @matches;

    outputs

    XXXABCDGXXXCCCDTGYYYCCCYYY XXXABCDGXXXCCCDTGYYY XXXCCCDTGYYYCCCYYY XXXCCCDTGYYY

    Update: Followed japhy's suggestion

      Dear ikegami,
      To confirm my understanding on the use of "local our".
      You mean in this context right?
      foreach my $ss (@str_set) { local our @matches; $ss =~ m/ (XXX.*YYY) # Search and capture (?{ push @matches, $1 }) # Save result (?!) # Try again /x; print Dumper \@matches ; }
      Namely if I use "my" instead it will return:
      $VAR = [ # with something ] # and then empty.... $VAR = []; $VAR = []; etc...


      ---
      neversaint and everlastingly indebted.......
      I've been asked why I used a package variable instead of a lexical variable. It's because regexps close around the lexicals that exist when they are first run.
      # pass 1 2 3 # --- --- --- sub test { my @matches; '' =~ / (?{ push @matches, 'a' }) (?{ print(scalar(@matches), "\n") }) # 1 2 3 /xg; print(scalar(@matches), "\n"); # 1 0 0 } test() for 1..3;

      A variable called @matches is created everytime test is called. The regexp always uses the variable from the first call.

        You can still do the same thing using lexicals (if you are allergic to using symbol-table variables). The only thing to be careful is to reuse the same push statement and same target array every time. This is adapted from the code in Re^3: Regexes: finding ALL matches (including overlap):
        { my @matches; my $push = qr/(?{ push @matches, $1 })/; sub match_all_ways { my ($string, $regex) = @_; @matches = (); $string =~ m/($regex)$push(?!)/; return @matches; } } print match_all_ways( "TXXXABCDGXXXCCCDTGYYYCCCYYYCC", qr/XXX.*YYY/ );
        Technically, $push is not needed -- you could just include (?{push @matches, $1}) in the m// statement inline. However, I like this way as it makes it a much more obvious that this part of the regex is only compiled once.

        blokhead

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://544751]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (10)
As of 2014-12-18 05:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (42 votes), past polls