Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

regex extraction for variable number of args

by NetWallah (Canon)
on Jul 21, 2012 at 06:39 UTC ( [id://982955]=perlquestion: print w/replies, xml ) Need Help??

NetWallah has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to parse something that looks like a function call, and extract the function name, and the arguments.

Attempting to do this with a single regex gets me either the first, or last arg - I cant seem to handle a variable (can be zero) number of arguments.

Given:

$cmd=q|&COMPAREEQUAL(First-param.one, Second.param,Third-param)|;
I am trying to extract:
  • COMPAREEQUAL
  • First-param.one
  • Second.param
  • Third-param

Here is what I have tried:

perl -e '$cmd=q|&COMPAREEQUAL(First-param.one, Second.param,Third-para +m)|; print qq|$_;\n| for $cmd=~/.(\w+)(?:[\s\(,]+([^\s,!@#\$%&\*]*))*/'
Including the trailing asterisk gives me the last param. Excluding it gives me the first.

It is getting increasingly obvious that I am not approaching this regex correctly. Please enlighten. Thanks.

The call can have zero arguments. Arguments can contain pretty much anything, other than close-paren , white-space or comma or some nasty side-effect-inducing chars. They are not quoted.

Update 1: This regex gives better results (3 args found), but , for some reason, it does not match the "." in the second.param. Most likely it is because it is using the (\w+) to match the second param. I'm confused about how to repeat only the second part of the regex - the attempt with the non-capturing paren does not seem to repeat that group.

$cmd=~/.(\w+)(?:[\s(,]+([^\s,!\()]*))/g

             I hope life isn't a big joke, because I don't get it.
                   -SNL

Replies are listed 'Best First'.
Re: regex extraction for variable number of args
by kcott (Archbishop) on Jul 21, 2012 at 07:14 UTC

    The following gives the output you want with the provided example:

    #!/usr/bin/env perl use 5.010; use strict; use warnings; my $cmd = '&COMPAREEQUAL(First-param.one, Second.param,Third-param)'; my $cms_re = qr{ (?: [&( ] )? ( [^(), ]+ ) (?: [(,)] )? }x; say for $cmd =~ m{$cms_re}g;

    Output:

    $ pm_cmd_args_regex.pl COMPAREEQUAL First-param.one Second.param Third-param

    -- Ken

      Thank you - this is beautiful, elegant, simple and operational.

      I modified it slightly to remove what I thought was unnecessary grouping - this still works:

      qr{ \&? # Leading char for Sub-name - ( [^(),\s]+ ) # Sub-name OR arg [(,)]? # Trailing "(" or comma }x;
      Thanks also to others who posted working solutions.

      As a learning experience, I would also like to understand why the repetition attempt in my O.P - the one without the /g - did not work. Any assistance on that ?

                   I hope life isn't a big joke, because I don't get it.
                         -SNL

        A regular expression without /g will always match a string from the start, and only once. You need /g for making the engine try multiple times. Your for loop only gets one result. Whether that result is the first target or the last target hinges on where your regex can match.
Re: regex extraction for variable number of args
by aitap (Curate) on Jul 21, 2012 at 07:38 UTC
    Isn't split easier?
    my $cmd = q|&COMPAREEQUAL(First-param.one, Second.param,Third-param)|; if ($cmd =~ /^&([^(]+)\(([^)]+)\)$/) { my $sub = $1; my @args = split /,\s*/,$2; print Dumper [ $sub, \@args ]; } __END__ $VAR1 = [ 'COMPAREEQUAL', [ 'First-param.one', 'Second.param', 'Third-param' ] ];
    </c>
    Sorry if my advice was wrong.
Re: regex extraction for variable number of args
by clueless newbie (Curate) on Jul 21, 2012 at 11:01 UTC
    hi,

    I think you're looking for the balance parenthesis regex of Jeffery Friedl. After tweeking for quoted strings it looks something like ...

    my ($np); # The initial balance parentheses expression with an embedded set is: # $np=qr/\( ([^()] | (??{$np})) *\)/x; # And quoted strings are "(?:[^"]|\")*" and '(?:[^']|\')*' or ('|")(? +:[^\1]|\\1)*\1 so $np=qr/ \( # The opening "(" (( # We'll want this hence the cap +turing () '(?:[^']|\')*?' #' a single quote string | "(?:[^"]|\")*?" #" a double quote string | [^()] # not a parentheses | (??{$np}) )*) \) # and the closing ")" /x;

      Hi, NetWallah,

      The OP is indeed interesting and goads me into this.

      Thanks!

      #!/usr/bin/perl use strict; use warnings; use Smart::Comments; print "\n" x5; my ($np_1,$np_2); # The initial balance parentheses expression with an embedded set +is: # $np=qr/\( ([^()] | (??{$np})) *\)/x; # And quoted strings are "(?:[^"]|\")*" and '(?:[^']|\')*' or ('|" +)(?:[^\1]|\\1)*\1 so $np_1=qr/ \( # The opening "(" ((?: # We'll want this hence the + capturing () | [^'"()] #' not a ' " ( or ) | (?:'(?:[^']|\')*?') #' a single quote string | (?:"(?:[^"]|\")*?") #" a double quote string | (??{$np_1}) # parenthesized expression )*) \) # and the closing ")" /x; # Deal with the argument list $np_2=qr/ ( # We'll want this hence the + capturing () (?: # [^'"(),] #' not a ' " ( ) or , | (?:'(?:[^']|\')*?') #' a single quote string | (?:"(?:[^"]|\")*?") #" a double quote string | $np_1 # parenthesized expression +see above )* )(,|\z) # termining , or \z /x; my $string=q{other stuff &COMPAREEQUAL(First-param.one,'(,',Third-pa +ram); more stuff other stuff &COMPAREEQUAL(one(foo('bar'))+two(foobar +),'(,',Third-param); more stuff dude() }; ### $string while ($string =~ m/\b(\w+)\s*$np_1/g) { my ($subroutine_name,$argument_list)=($1,$2); ### $subroutine_name ### $argument_list while ($argument_list =~ m/$np_2/g and $1) { ### argument: $1 }; }; __END__
      ### $string: 'other stuff &COMPAREEQUAL(First-param.one,\'(,\',Third-p +aram); more stuff other stuff &COMPAREEQUAL(one(foo(\'bar\'))+two(foo +bar),\'(,\',Third-param); more stuff dude() ' ### $subroutine_name: 'COMPAREEQUAL' ### $argument_list: 'First-param.one,\'(,\',Third-param' ### argument: 'First-param.one' ### argument: '\'(,\'' ### argument: 'Third-param' ### $subroutine_name: 'COMPAREEQUAL' ### $argument_list: 'one(foo(\'bar\'))+two(foobar),\'(,\',Third-param' ### argument: 'one(foo(\'bar\'))+two(foobar)' ### argument: '\'(,\'' ### argument: 'Third-param' ### $subroutine_name: 'dude' ### $argument_list: ''
Re: regex extraction for variable number of args
by Anonymous Monk on Jul 21, 2012 at 07:18 UTC

    Its got something to do with $2 being a scalar, and being able to contain only a single value -- try the following with  -Mre=debug

    $ perl -le " $_ = q{abcdef}; print for m{(.)(.)}g; a b c d e f $ perl -le " $_ = q{abcdef}; print for m{(.)(.)*}g; a f

    So I think the (old) idiom is this

    $cmd=q|&COMPAREEQUAL(First-param.one, Second.param,Third-param)|; my $re = qr{ \G (?: \& | (\w+) # $1 \( | (?: ( # $2 [^,\)\s]+ ) | (?: [,\s]+ ) ) | \) ) }mx; while( $cmd =~ m{$re}g ){ use Data::Dump; dd [ $1, $2 ]; } __END__ [undef, undef] ["COMPAREEQUAL", undef] [undef, "First-param.one"] [undef, undef] [undef, "Second.param"] [undef, undef] [undef, "Third-param"] [undef, undef]

    Double-undef is when the non $1/$2 parts match

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://982955]
Approved by kcott
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (6)
As of 2024-03-29 11:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found