Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Regular expression quantifiers and the /gsmx modifiers

by davis (Vicar)
on Jun 16, 2012 at 21:22 UTC ( #976599=perlquestion: print w/ replies, xml ) Need Help??
davis has asked for the wisdom of the Perl Monks concerning the following question:

*blows dust off perlmonks.org account. I've been away a while, but thought my Perl-fu was up to this rather simple task.

I believe I'm suffering from a rather simple misunderstanding of the /x modifier, and of the {} quantifiers. It's also the first time I've used the named capture buffers, but I'm not sure that matters.

Here's a complete script which should produce a match, so what have I done wrong?

#!/usr/bin/perl use warnings; use strict; use Data::Dumper; my $vg_details = ' --- Physical volumes --- PV Name /dev/dsk/c14t3d1 PV Name /dev/dsk/c15t3d1 Alternate Link PV Status available Total PE 15997 Free PE 0 Autoswitch On Proactive Polling On '; while($vg_details =~ m/^\s*PV\s+Name\s*(?<pv_name>\S+)\s*$ (^\s*PV\s+Name\s+(?<alt_link>\S+)\s+Alternate\s+Link\s*$){0,20} +# skip them ^\s*PV\s+Status\s+(?<pv_status>\S+)\s*$ ^\s*Total\s+PE\s+(?<total_pe>\S+)\s*$ ^\s*Free\s+PE\s+(?<free_pe>\d+)\s*$ ^\s*Autoswitch\s+(?<autoswitch>\S+)\s*$ ^\s*Proactive\s+Polling\s+(?<proactive_polling>\S+)\s*$/gsmx) { my $pv_name = $+{pv_name}; print "matched $pv_name"; }

The example data is slightly contrived, in that the "Alternate Link" lines are optional (and there may be many). Removing the additional 6 lines below the "PV Name" line in the regex makes it work, so have I completely confused the multi-line comment switch?

Also, I know this is a slightly ludicrous method to process VG data, but I've been handed a big, big list of "vgdisplay -v" output, and this particular edge case is failing. I've reduced it to this minimal example and my eyes still can't spot what's wrong. What silly mistake am I making?


davis

Comment on Regular expression quantifiers and the /gsmx modifiers
Download Code
Re: Regular expression quantifiers and the /gsmx modifiers
by morgon (Deacon) on Jun 16, 2012 at 22:03 UTC
    I think the problem is that $ matches BEFORE a newline and ^ matches AFTER a newline (when using /sm), but you also need to consume the newline itself.

    Consider:

    use strict; my $s = <<"__end__"; hubba bubba __end__ print "matched 1\n" if $s =~ m/^hubba$ ^bubba$/smx; print "matched 2\n" if $s =~ m/^hubba$ \n ^bubba$/smx;
    Here only the second regex matches because in the first after the $ matches there is still a newline left and therefore the ^ does not match. You need to consume that in the regex as the second example shows.

      Damn. I completely missed that.

      #!/usr/bin/perl use warnings; use strict; use Data::Dumper; my $vg_details = ' --- Physical volumes --- PV Name /dev/dsk/c14t3d1 PV Name /dev/dsk/c15t3d1 Alternate Link PV Status available Total PE 15997 Free PE 0 Autoswitch On Proactive Polling On '; while($vg_details =~ m/^\s*PV\s+Name\s*(?<pv_name>\S+)\s*$ \n (^\s*PV\s+Name\s+(?<alt_link>\S+)\s+Alternate\s+Link\s*$ \n){0,20} + # skip them ^\s*PV\s+Status\s+(?<pv_status>\S+)\s*$ \n ^\s*Total\s+PE\s+(?<total_pe>\S+)\s*$ \n ^\s*Free\s+PE\s+(?<free_pe>\d+)\s*$ \n ^\s*Autoswitch\s+(?<autoswitch>\S+)\s*$ \n ^\s*Proactive\s+Polling\s+(?<proactive_polling>\S+)\s*$ \n/gsmx) { my $pv_name = $+{pv_name}; print "matched $pv_name"; }

      Seems to DWIM. I genuinely cannot believe I've been gawping at that for so long. My thanks.


      davis

Re: Regular expression quantifiers and the /gsmx modifiers
by Kenosis (Priest) on Jun 16, 2012 at 22:33 UTC

    I removed the ^ and $ notations in your regex, and your matching worked perfectly:

    #!/usr/bin/perl use Modern::Perl; my @matched = qw {pv_name alt_link pv_status total_pe free_pe autoswitch proactive +_polling}; my $vg_details = ' --- Physical volumes --- PV Name /dev/dsk/c14t3d1 PV Name /dev/dsk/c15t3d1 Alternate Link PV Status available Total PE 15997 Free PE 0 Autoswitch On Proactive Polling On '; $vg_details =~ m/^\s*PV\s+Name\s*(?<pv_name>\S+)\s* (\s*PV\s+Name\s+(?<alt_link>\S+)\s+Alternate\s+Link\s*) # skip t +hem \s*PV\s+Status\s+(?<pv_status>\S+)\s* \s*Total\s+PE\s+(?<total_pe>\S+)\s* \s*Free\s+PE\s+(?<free_pe>\d+)\s* \s*Autoswitch\s+(?<autoswitch>\S+)\s* \s*Proactive\s+Polling\s+(?<proactive_polling>\S+)\s*/gsmx; say $+{$_} for @matched;

    Results:

    /dev/dsk/c14t3d1 /dev/dsk/c15t3d1 available 15997 0 On On
      I removed the ^ and $ notations in your regex, and your matching worked perfectly...

      Of course, the reason is that a newline  \n is a member of the  \s regex character set, and there is an abundance of  \s* in the modified regex.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://976599]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (15)
As of 2014-07-23 17:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (148 votes), past polls