Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Regex: first match that is not enclosed in parenthesis

by monkprentice (Novice)
on Jun 30, 2013 at 09:13 UTC ( #1041562=perlquestion: print w/ replies, xml ) Need Help??
monkprentice has asked for the wisdom of the Perl Monks concerning the following question:

Hi, suppose I have a term like this:

"1*(2+3)*(3+4)+5*(6+7)"

and I need to find the first occurrence of "+" that is not enclosed within parenthesis, i.e. the one before "5", how would I do that with a regular expression ?
Is it possible ?

I tried something like:

$term =~ /((?:[^\+]|(?:\(.*\)))*)$_(.*)/

Which is supposed to mean:
Skip as many sub-patterns of the form ("no plus signs" OR "any subterm in parenthesis")
then match the rest in a group.

What it actually does, is to stop at the first "+" it comes across.
I had hoped that the greedy behaviour would make it match the second option (any subterm in parenthesis) instead and skip the (2+3) part entirely.

I solved it now in a loop, but I was wondering if there was some elegant way to do it in a regular expression.

thanks in advance,
mp

Comment on Regex: first match that is not enclosed in parenthesis
Select or Download Code
Re: Regex: first match that is not enclosed in parenthesis
by james2vegas (Chaplain) on Jun 30, 2013 at 10:17 UTC
    You might want to split the string using Text::Balanced's extract_multiple function and then apply a map to extract your data (this code extracts the first '+' and replaces it with a '-'):
    use strict; use warnings; use List::Util qw/first/; use Text::Balanced qw/extract_multiple extract_bracketed/; my $text = "1*(2+3)*(3+4)+5*(6+7)+42"; my @results = extract_multiple( $text, [ { Bracketed => sub { extract_bracketed( $_[0], '()' ) } }, { PlusOperator => qr{[+]} }, { MultiplyOperator => qr{[*]} }, { Number => qr{\d+(?:\.\d+)?} }, ] ); my $operator = first { ref($_) eq 'PlusOperator' } @results; # $operator holds '+' my $plus; my $new_string = join( '', map { if ( !$plus && ref $_ eq 'PlusOperator' ) { $plus = 1; '-' } else { $$_ } } @results ); print "$new_string\n";
    or the answer is 'no', regexes that parse data like that are never going to be elegant.
Re: Regex: first match that is not enclosed in parenthesis
by rjt (Deacon) on Jun 30, 2013 at 11:31 UTC

    If, as your example suggests, you do not need arbitrarily nested parens, the following will work:

    my $str = "1*(2+3)*(3+4)+5*(6+7)"; $str =~ /(?: \( .+? \) | [^+()] )+ (\+ .*)/x; say "Match: $1";

    Outputs:

    Match: +5*(6+7)

    However, if you need to handle a string where a + sign immediately follows a closing paren, such as ((3+4)+2), you can use recursive regexps:

    $str = "1*(2+3)*(7+(3+4)+2)+5*(6+7)"; $str =~ /( \( (?: [^()]++ | (?1) )* \) | [^+()] )+ (\+ .*)/x; say "Match: $2";

    Outputs:

    Match: +5*(6+7)

    Update: Here's a more descriptive version of that last regexp:

Re: Regex: first match that is not enclosed in parenthesis
by monkprentice (Novice) on Jun 30, 2013 at 12:26 UTC

    Yes, nested parenthesis are an additional requirement. I forgot to mention it, sorry.

    Thanks for both responses, this will nicely replace my iterative approach.

Re: Regex: first match that is not enclosed in parenthesis
by LanX (Canon) on Jun 30, 2013 at 12:51 UTC
    > and I need to find the first occurrence of "+" that is not enclosed within parenthesis, i.e. the one before "5", how would I do that with a regular expression ?

    depends what you mean with "finding"!

    If it's just the position you need, a simple technique I like is to replace all inner paren-pairs till you get a "cleaned" string and then to try finding whatever it needs.

    This demonstrates the intermediate results:

    DB<172> $s=$s0 => "1*((2+3)*((3+4)+5))*(6+7)+8" DB<173> print "$s\n" while $s =~ s# \( [^()]* \) # '.' x length($&) +#gex 1*(.....*(.....+5))*.....+8 1*(.....*.........)*.....+8 1*.................*.....+8 DB<174> index $s, '+' => 25

    Cheers Rolf

    ( addicted to the Perl Programming Language)

      Finding in my case means to return the prefix and postfix substrings.

      I also thought of this. This way one could just find the substring before and after the retrieved position.
      Good Idea.

        Sounds like what you really need is a lexer.

        hdb regularly posts such solutions based on dispatch tables, worth searching for...

        HTH! =)

        Cheers Rolf

        ( addicted to the Perl Programming Language)

        Something like this:

        use strict; use warnings; my $input = "1*(2+3)*(3+4)+5*(6+7)"; my $level = 0; my $tab = "| "; my %action = ( '(' => sub { print "\n", $tab x ++$level, shift }, ')' => sub { print "\n", $tab x $level--, shift }, '+' => sub { print "This one>>>" unless $level; print +shift }, 'default' => sub { print shift }, ); ( $action{$_} // $action{'default'} )->($_) for $input =~ /./g;

        The additional effort is only worthwhile if you have ambitions beyond your initial question.

Re: Regex: first match that is not enclosed in parenthesis
by monkprentice (Novice) on Jun 30, 2013 at 17:04 UTC

    I'm using this code in a little project I started out of boredom. Its purpose is to print function graphs on the commandline, like gnuplot can do as well.

    At the current point it parses mathematical expressions into a kind of abstract syntax tree and plots them onto the cmdl. I knew that I could have defined a grammar for it and implemented a serious parser and lexer, but I wanted to do it myself first and then see what I could have done better.

    So far, the supported expressions included only +-*/ and nested parenthesis. Since I wanted to include more expressions and functions, I figured it would be a pain to deal with operator precedence unless I take on a more general approach of precedence levels.
    The initial question is because I want to find the first operator with the lowest precedence and recursively build subtrees from the prefix and postfix terms of the found operator.
    The difficult question for me here was how to handle the parenthesis-terms atomically during this operator lookup.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1041562]
Approved by Happy-the-monk
Front-paged by rjt
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (3)
As of 2014-08-21 03:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (127 votes), past polls