Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Matching a pattern which resides within round brackets

by jbl_bomin (Acolyte)
on Nov 26, 2008 at 21:57 UTC ( #726238=perlquestion: print w/ replies, xml ) Need Help??
jbl_bomin has asked for the wisdom of the Perl Monks concerning the following question:

I ran into something which I thought should be easy, but has stumped me. I'm trying to match data within a round bracket. For example, take the following string:

$var = "(this is a good idea!), No it's not";

Now here's the regex I'm using to extract this data:

if ( $var =~ m/^\((.*)\),(.*)$/ ) { @args = ($1, $2); }

But this doesn't seem to work. For some reason It's not escaping the brackets correctly. I've tried this as well...

if ( $var =~ m/^[(](.*)[)],(.*)$/ ) { @args = ($1, $2); }

But no good. when I only escape the first bracket with [(], I seem to be getting somewhere, but as soon as I finish it with [)], it's broken again. Many thanks and blessing be upon you.

~~UPDATE~~

Alright, lemme try this again. The best way I can explain this is with some working (or should be working) code:

#!/usr/bin/perl -w use strict; use Data::Dumper; my $data = "U|(Memory is incorrectly balanced between the NUMA "; $data .= "nodes of this system, which will lead to poor performance. +"; $data .= "See),_NUMA_CRIT_,;(/proc/vmware/NUMA/hardware),_NUMA_CRIT_"; $data .= ",__REFERAL__file:__SELF__;( for details on your current "; $data .= "memory configuration),_NUMA_CRIT_,;"; (my $m_type = $data) =~ s/^((?:U|M))\|.*$/$1/; (my $p_list = $data) =~ s/^(?:U|M)\|(.*)$/$1/; my @sub_p_tmp = split(';',$p_list); my @args; for ( my $i = 0; $i <= $#sub_p_tmp; $i++ ) { if ( $sub_p_tmp[$i] =~ m/^\((.*)\),(.*),(.*)[,;]$/ ) { @args = ($1, $2, $3); } } #print "$m_type\n\n$p_list\n\n".Dumper(@sub_p_tmp)."\n"; print "\nArgs:\n"; print Dumper(@args)."\n";

The IF statement has a regex that should be working, returning output as follows:

Args: $VAR1 = 'Memory is incorrectly balanced between the NUMA nodes of this + system, which will lead to poor performance. See'; $VAR2 = '_NUMA_CRIT_';

However, Dumper doesn't return anything.

Comment on Matching a pattern which resides within round brackets
Select or Download Code
Re: Matching a pattern which resides within round brackets
by gwadej (Chaplain) on Nov 26, 2008 at 22:05 UTC

    I don't see anything wrong with your regex. I tried

    print "$1\n$2\n" if $var =~ /^\((.*)\),(.*)$/;

    on your string and got

    this is a good idea! No it's not

    exactly as I would expect.

    On a slightly different topic, I would have generally used ([^)]*) rather than (.*) as the first capture. It is more likely to do what you want.

    G. Wade
Re: Matching a pattern which resides within round brackets
by moritz (Cardinal) on Nov 26, 2008 at 22:11 UTC
    It would help to know your input data, and what you would like to match. In general you can escape meta characters by prepending a backslash, that should work fine.

    The usual approach is this:

    my $regex = qr{ ^ # anchor \( # opening paren ([^)]*) # everything but a closing paren, captured in $1 \) # closing paren ,(.*) # the rest, captured in $2 $ }xs;
Re: Matching a pattern which resides within round brackets
by toolic (Chancellor) on Nov 26, 2008 at 22:13 UTC
    You should be more specific when you say "this doesn't seem to work". Please show your expected output and your actual output, if any. If you are trying to capture the parentheses, then you just need to move your backslashes a little bit:
    use strict; use warnings; use Data::Dumper; my $var = "(this is a good idea!), No it's not"; if ( $var =~ /^(\(.*\)),(.*)$/ ) { my @args = ($1, $2); print Dumper(\@args); } __END__ $VAR1 = [ '(this is a good idea!)', ' No it\'s not' ];

    Also, it would be easier to read your post if you did not use "pre" tags.

Re: Matching a pattern which resides within round brackets
by ww (Bishop) on Nov 27, 2008 at 00:19 UTC

    Good answers above; suggestion below as a variant way of saying "Welcome to the Monastery."

    Please read:

    • On asking for help which may help understand why Monks asked that you outline your expectations (specfically, what you wanted) and what you actually got (as it's not absolutely clear whether you want to capture the enclosing parens or not).
    • Then check Markup in the Monastery and Writeup Formatting Tips for formatting guidance (hint: pre is strongly discouraged.)

    Re your code: Note that the [ ... ] does NOT "escape" the character(s) it encloses in the usual sense (though it will cause the paren to be treated as a literal, in a regex). But so too will the customary escape character, the backslash (\).

    If you plan to do much with Perl regexen, Mastering Regular Expressions (ca $US45.) and/or O'Reilly's "Regular Expressions Pocket Reference" ( <$US10.) are well worth the investment.

Re: Matching a pattern which resides within round brackets
by TGI (Vicar) on Nov 27, 2008 at 01:11 UTC

    You may also wish to look into Text::Balanced. extract_bracketed looks like it would do the trick for you.


    TGI says moo

      I've updated the question to give more information.
Re: Matching a pattern which resides within round brackets
by jbl_bomin (Acolyte) on Nov 27, 2008 at 15:24 UTC

    Of course, I can always avoid this problem by doing something like this:

    $sub_p_tmp[$i] =~ s/[()]//; if ( $sub_p_tmp[$i] =~ m/^(.*)\,(.*)\,(.*)[,;]$/ ) { @args = ($1, $2, $3); }

    But I feel like that's cheating. I want to know why the the original way doesn't work!! :)

      Seems I'm running into a problem with the comma when I remove the brackets like that... ah well, need to experiment a bit more.
Re: Matching a pattern which resides within round brackets
by jbl_bomin (Acolyte) on Nov 27, 2008 at 16:58 UTC
    FYI (just to bring this to a close, and for anyone else who may run into the problem)..

    I've been able to resolve the this with the following code, which seems to give me the output I'm looking for:

    #!/usr/bin/perl -w use strict; use Data::Dumper; my $data = "U|(Memory is incorrectly balanced between the NUMA "; $data .= "nodes of this system, which will lead to poor performance. +"; $data .= "See),_NUMA_CRIT_,;(/proc/vmware/NUMA/hardware),_NUMA_CRIT_"; $data .= ",__REFERAL__file:__SELF__;( for details on your current "; $data .= "memory configuration),_NUMA_CRIT_,;"; (my $m_type = $data) =~ s/^((?:U|M))\|.*$/$1/; (my $p_list = $data) =~ s/^(?:U|M)\|(.*)$/$1/; my @sub_p_tmp = split(/;/,$p_list); print "sub_p_tmp:\n"; print Dumper(@sub_p_tmp); my @args; for ( my $i = 0; $i <= $#sub_p_tmp; $i++ ) { $sub_p_tmp[$i] =~ s/,$//; if ( $sub_p_tmp[$i] =~ m/^\((.*)\),(.*),(.*)$/ ) { push @args, ($1, $2, $3); } elsif ( $sub_p_tmp[$i] =~ m/^\((.*)\),(.*)$/ ) { push @args, ($1, $2); } } #print "$m_type\n\n$p_list\n\n".Dumper(@sub_p_tmp)."\n"; print "\nArgs:\n"; print Dumper(@args)."\n";

      Try this.

      #!/usr/bin/perl -w use strict; use warnings; use Data::Dumper; my $data = join '', "U|(Memory is incorrectly balanced between the NUMA ", "nodes of this system, which will lead to poor performance. ", "See),_NUMA_CRIT_,;(/proc/vmware/NUMA/hardware),_NUMA_CRIT_", ",__REFERAL__file:__SELF__;( for details on your current ", "memory configuration),_NUMA_CRIT_,;"; print "DATA\n$data\n\n"; #(my $m_type = $data) =~ s/^((?:U|M))\|.*$/$1/; # gets U #(my $p_list = $data) =~ s/^(?:U|M)\|(.*)$/$1/; # gets U, i think thi +s is the reason you weren't getting anything # This approach does what I think you are trying for without as much w +ork for you or the computer. my ($m_type, $p_list) = split /\|/, $data, 2; my @sub_p_tmp = split /;/, $p_list; print "MTYPE: $m_type\n"; print "PLIST: $p_list\n"; my @args; # Try using # for my $p_tmp ( @sub_p_tmp ) # Instead of the c style for loop, unless you are processing HUGE arra +ys. for ( my $i = 0; $i <= $#sub_p_tmp; $i++ ) { print "$i: $sub_p_tmp[$i]\n"; # Here's your regex based method fixed to work # if ( $sub_p_tmp[$i] =~ m/^\(([^)]*)\),([^,]*),(.*)$/ ) # { # @args = ($1, $2, $3); # print "\nArgs:\n"; # print Dumper(\@args)."\n"; # } # Here's a split based approach. # in both approaches @args is overwritten each pass through the loop. my @temp = split /,/, $sub_p_tmp[$i]; if ( @temp ) { # Get rid of the extra parens. $temp[0] =~ s/^\(|\)$//g; # Overwriting the value of @args each time. # Do you mean # push @args, @temp[0..2]; # or # push @args, [ @temp[0..2]]; $temp[0] =~ s/^\(|\)$//g; @args = @temp[0..2]; print "\nArgs:\n"; print Dumper(\@args)."\n"; } }


      TGI says moo

        Thanks TGI, I'll definitely play around with this!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://726238]
Approved by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (3)
As of 2014-09-20 08:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (157 votes), past polls