Beefy Boxes and Bandwidth Generously Provided by pair Networks vroom
Do you know where your variables are?
 
PerlMonks  

Help modifying recursive regex

by beeflobill (Initiate)
on May 07, 2013 at 18:41 UTC ( #1032522=perlquestion: print w/ replies, xml ) Need Help??
beeflobill has asked for the wisdom of the Perl Monks concerning the following question:

A long time ago I worked really hard and wrote a piece of recursive regex code which I'm trying to modify now. It takes text with nested parentheses and returns a list with the "top" groups of parentheses.

Here is an example which shows what the regex does:
Input: "A (B C D)"
Output: "B C D"

Input: "A (B C D) (E F G)"
Output: "B C D", "E F G"

Input: "A B (C (D E) (F G)) (H I)"
Output: "C (D E) (F G)", "H I"

Even when I wrote the code I knew I would really, really want to keep the data that wasn't in parentheses too, but I couldn't figure out how to do it. Now I'm back on the problem.

Here is an example of what I am trying to get:
Input: "A (B C D)"
Output: "A", "B C D"

Input: "A (B C D) (E F G)"
Output: "A", "B C D", "E F G"

Input: "A B (C (D E) (F G)) (H I)"
Output: "A", "B", "C (D E) (F G)", "H I"

I still don't know how to do that. I've provided a test script demonstrating the subroutine that does what I'm describing. May I supplicate, how might I modify this to do what I need? Or, is there simply a better way to approach this problem? Thanks.

#!/usr/bin/perl -w use strict; sub strip_paren { my $input = $_[0]; my $re; $re = qr/ (?: \( ( (?: [^()]+ | (??{$re}) )+ ) \) ) | \(\) /x; my @output = $input =~ /$re/g; return @output; } sub print_crud { print shift, "\n"; foreach my $line (@_) { print " $line\n"; } print "\n"; } my $string = "A (B C D)"; print_crud($string, strip_paren($string)); $string = "A (B C D) (E F G)"; print_crud($string, strip_paren($string)); $string = "A B (C (D E) (F G)) (H I)"; print_crud($string, strip_paren($string));

Comment on Help modifying recursive regex
Download Code
Re: Help modifying recursive regex
by hdb (Parson) on May 07, 2013 at 19:11 UTC

    Check the following Re^2: splitting data. Changing the code slightly will give you trees based on levels of parantheses.

    use strict; use warnings; while(my $input = <DATA>){ print "input: $input\n"; my $level = 0; my $tab = "| "; my %action = ( '(' => sub { print "\n", $tab x ++$level, shift }, ')' => sub { print "\n", $tab x $level--, shift }, 'default' => sub { print shift }, ); ( $action{$_} // $action{'default'} )->($_) for $input =~ /./g; print "\n\n\n"; } __DATA__ A (B C D) A (B C D) (E F G) A B (C (D E) (F G)) (H I)

    with the following output

    input: A (B C D) A | (B C D | ) input: A (B C D) (E F G) A | (B C D | ) | (E F G | ) input: A B (C (D E) (F G)) (H I) A B | (C | | (D E | | ) | | (F G | | ) | ) | (H I | )

    not based on regex though... but if you now replace parantheses on level 1 with quotes and put quotes around words on level 0 you should have what you want. Just change the subs in %action as required.

Re: Help modifying recursive regex
by choroba (Abbot) on May 07, 2013 at 20:06 UTC
    Not using a regex, but a simple automaton:
    #!/usr/bin/perl use warnings; use strict; use Test::More; sub groups { my $string = shift; my $depth = 0; my @return = (q()); for my $char (split //, $string) { $depth++ if '(' eq $char; $depth-- if ')' eq $char; if ((2 > $depth and qw/) (/[$depth] eq $char) or (' ' eq $char and ! $depth)) { push @return, q(); } else { $return[-1] .= $char; } } s/^ *| *$//g for @return; # trim return [ grep length, @return]; } is_deeply(groups("A (B C D)"), ["A", "B C D"]); is_deeply(groups("A (B C D) (E F G)"), ["A", "B C D", "E F G"]); is_deeply(groups("A B (C (D E) (F G)) (H I)"), ["A", "B", "C (D E) (F G)", "H I"]); is_deeply(groups("A B (C (D E) (F G)) X Y (H I)"), ["A", "B", "C (D E) (F G)", "X", "Y", "H I"]); done_testing();
    لսႽ ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Help modifying recursive regex
by gurpreetsingh13 (Scribe) on May 08, 2013 at 05:25 UTC
    Yes, splitting is the simple solution.
    sub get_groups { my @groups = (); my $level = 0; my $group = ""; #Keep on increasing the level once you find the opening brace. #After coming out of loop, level should be again 0 foreach my $val ( split //, shift ) { next if $val =~/\s/; if ( $level == 0 && !($val ~~ [ "(", ")" ]) ) { push @groups, +$val; next; } if ( $level != 0 && !($val ~~ [ "(", ")" ]) ) { $group.=$val; +next; } if ( $val eq "(" && $level==0) { $level++;$group = ""; next; } if ( $val eq "(" && $level!=0) { $level++;$group .= "("; next; + } if ( $val eq ")" && $level==1) {$level--;push @groups, $group; + $group = ""; next; } if ( $val eq ")" && $level!=1) {$level--;$group .= ")"; next; +} } print Dumper(@groups); } ## ---------- end sub get_groups

    Input:

    my $string = "A (B C D)"; get_groups($string); $string = "A (B C D) (E F G)"; get_groups($string); $string = "A B (C (D E) (F G)) (H I)"; $string = "(A B) M N ((C (D E)) (F G)) (Z(H I)) T R"; get_groups($string);

    Output: $VAR1 = 'A'; $VAR2 = 'BCD'; $VAR1 = 'A'; $VAR2 = 'BCD'; $VAR3 = 'EFG'; $VAR1 = 'AB'; $VAR2 = 'M'; $VAR3 = 'N'; $VAR4 = '(C(DE))(FG)'; $VAR5 = 'Z(HI)'; $VAR6 = 'T'; $VAR7 = 'R';

Re: Help modifying recursive regex
by Laurent_R (Priest) on May 08, 2013 at 13:02 UTC

    Thank you, HDB and Choroba, for your elegant and enlightening solutions.

    I had to do something similar a few weeks ago, and my solution was far more complicated, at least 3 times as much code. I wish I had thought about something similar to your proposals (and hope I will next time). I really learned something from your posts.

Re: Help modifying recursive regex
by Krambambuli (Deacon) on May 08, 2013 at 16:49 UTC
    Here's a regexp based solution, with minor tweeks on your code:

    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; sub strip_paren { my $input = $_[0]; our $re; $re = qr/ ( [^()]+ ) (?: \( ( (?: (??{$re}) )+ )+ \) )? /x; my @output = (); @output = $input =~ /$re/g; return @output; } sub print_crud { print "\nLine: ", shift, "\n"; foreach my $line (@_) { next if not defined $line; print " '$line'\n"; } } while (<DATA>) { chomp; print_crud( $_, strip_paren($_)); } exit; __DATA__ A A B A B C (A B) C A (B) C A (B C D) A(B C D) A (B C D) E A (B C D)E A (B C D) (E F G) A (B C D) (E F G) I A B (C (D E) (F G)) (H I) A B (C (D E) (F G)) (H I) J
    I think it comes close to what you wanted.
Re: Help modifying recursive regex
by vagabonding electron (Friar) on May 09, 2013 at 17:20 UTC

    I have tried this with Text::Balanced and it seems to work:

    #!/usr/bin/perl use strict; use warnings; use Text::Balanced ':ALL'; while ( my $text = <DATA> ) { my @data = extract_multiple( $text, [ \&extract_bracketed, qr/\s+/ ] ); @data = grep { !/^$/ } map {s/^\s+//;s/\s+$//;s/^\(//;s/\)$//;$_} @data; print join ( '---', @data ), "\n"; } __END__ A A B A B C (A B) C A (B) C A (B C D) A(B C D) A (B C D) E A (B C D)E A (B C D) (E F G) A (B C D) (E F G) I A B (C (D E) (F G)) (H I) A B (C (D E) (F G)) (H I) J A B (C (D E) (F G)) (H I) J K

    because it prints:

    A A---B A---B---C A B---C A---B---C A---B C D A---B C D A---B C D---E A---B C D---E A---B C D---E F G A---B C D---E F G---I A---B---C (D E) (F G)---H I A---B---C (D E) (F G)---H I---J A---B---C (D E) (F G)---H I---J---K

    There is an issue though (perhaps due to my lack of deep knowledge of the module). If I put a line print  $@->{error}, "\n" if $@; after the first line in the loop it prints Did not find opening bracket after prefix: "\s*" every time. Is this something to worry about or just to take note of?

    Just my 2 cents.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1032522]
Approved by davies
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (8)
As of 2014-04-19 11:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    April first is:







    Results (480 votes), past polls