Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Problem with alternating regex?

by dwlepage (Novice)
on Sep 11, 2012 at 02:17 UTC ( #992899=perlquestion: print w/ replies, xml ) Need Help??
dwlepage has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I was hoping someone could help me figure out why I can't seem to make an alternating match work. I've been stumped on this one for a couple days..

Here is the code. Basically I want to match the first value found between double quotes although there are two different formats for the lines:

set zone "VLAN" vrouter "trust-vr"
set zone id 100 "Internet_Only"

In both cases I want to capture the first value and was hoping to do it in a single regex. Here is what I have that isn't quite working:

if ($line =~ /^set\szone\s("([^"]*)"|id\s\d+\s"([^"]*)")/) { my $zone = $1; print "Config line=> $lineCount; Value=> $line; zone=> $ +zone\n"; }

Any ideas? It matches properly on: set zone "VLAN" vrouter "trust-vr", but the second line is returning: id 100 "Internet Only" when I only want what's between the quotes?
Thanks!

Comment on Problem with alternating regex?
Download Code
Re: Problem with alternating regex?
by Athanasius (Monsignor) on Sep 11, 2012 at 02:38 UTC

    The capturing parentheses in the second format are not properly aligned with the double-quote characters. Also, you should use a non-greedy quantifier. In fact, the whole regex can be greatly simplified:

    #! perl use strict; use warnings; my $lineCount; while (my $line = <DATA>) { chomp $line; ++$lineCount; if ($line =~ / ( " .*? " ) /x) { my $zone = $1; print "Config line=> $lineCount; Value=> $line; zone=> $zone\n +"; } } __DATA__ set zone "VLAN" vrouter "trust-vr" set zone id 100 "Internet_Only"

    Output:

    Config line=> 1; Value=> set zone "VLAN" vrouter "trust-vr"; zone=> "V +LAN" Config line=> 2; Value=> set zone id 100 "Internet_Only"; zone=> "Inte +rnet_Only"

    Hope that helps,

    Athanasius <°(((><contra mundum

      Greediness is not an issue for the OP because using a character class ([^"]*) constrains the match in any case.

      The double quote pairing is in fact correct - that is not the OP's problem either.

      I suspect the "complication" in the OP's regex is because specific matching is required. Your simplification is at the cost of matching almost anything.

      True laziness is hard work
        Greediness is not an issue for the OP because using a character class ([^"]*) constrains the match in any case.

        Good point.

        The double quote pairing is in fact correct
        # /^set\szone\s("([^"]*)"|id\s\d+\s"([^"]*)")/ # ^ ^ ^ ^

        Yes, it is actually the first format which has the right parenthesis in the wrong place. [Update 1: Apparently, I need better glasses! Sorry for the noise here.] But, as you say, that wasn’t the OP’s problem either.

        So, try again (with thanks to Anonymous Monk, below):

        if ($line =~ /^set\szone\s("[^"]*")|id\s\d+\s("[^"]*")/) { my $zone = $1 // $2; print "Config line=> $lineCount; Value=> $line; zone=> $zo +ne\n"; }

        Hope that actually does help!

        Update 2:
        Your simplification is at the cost of matching almost anything.

        Well, that’s a bit of an overstatement. Here is the OP’s original requirement:

        Basically I want to match the first value found between double quotes although there are two different formats for the lines.

        My first solution does match “the first value found between double quotes.” But, as you point out, it doesn’t take account of the 2 specific formats. My second solution will match only on one or other of these formats.

        Athanasius <°(((><contra mundum

Re: Problem with alternating regex?
by Anonymous Monk on Sep 11, 2012 at 02:54 UTC

    Employing Basic debugging checklist/brian's Guide to Solving Any Perl Problem

    I see

    #!/usr/bin/perl -- use strict; use warnings; use Data::Dump; my $stuff = q{set zone "VLAN" vrouter "trust-vr" set zone id 100 "Internet_Only"}; open my($in), '<', \$stuff; while(my $line = <$in>){ chomp $line; if ($line =~ /^set\szone\s("([^"]*)"|id\s\d+\s"([^"]*)")/) { my $zone = $1; #~ print "Config line=> $lineCount; Value=> $line; zone= +> $zone\n"; #~ print "Config line=> $.; Value=> $line; zone=> $zone\ +n"; dd [ $., $line, { 1, $1, 2, $2 } ]; } } __END__ [ 1, "set zone \"VLAN\" vrouter \"trust-vr\"", { 1 => "\"VLAN\"", 2 => "VLAN" }, ] [ 2, "set zone id 100 \"Internet_Only\"", { 1 => "id 100 \"Internet_Only\"", 2 => undef }, ]

    So that shows some problem, sometimes you want $1, sometimes $2, but then sometimes stuff is missing ... but if you replace that if block with

    use Text::Balanced qw/ :ALL /; my @parts = extract_multiple( $line, [ sub { extract_delimited( $_[0], '"', ); }, ], ); dd\@parts;

    you get

    ["set zone ", "\"VLAN\"", " vrouter ", "\"trust-vr\""] ["set zone id 100 ", "\"Internet_Only\""]

    You could do the same with

    my @parts = split /("[^"]*")/, $line; dd\@parts;

    Or a more complete pattern from http://search.cpan.org/perldoc/Text::Balanced#gen_delimited_pat

    my @parts = split /((?:\"(?:\\\"|(?!\").)*\"|\'(?:\\\'|(?!\').)*\'))/, + $line; dd\@parts;
Re: Problem with alternating regex?
by GrandFather (Cardinal) on Sep 11, 2012 at 03:12 UTC

    The problem is that $1 is the first (and outermost) set of capturing parenthesis so you capture the id \d+ prefix along with the stuff you want. A clean fix is to make the id match optional:

    use warnings; use strict; my $strs = <<STRS; set zone "VLAN" vrouter "trust-vr" set zone for "non-matched" stuff set zone id 100 "Internet_Only" quoted "stuff" without set zone STRS open my $fIn, '<', \$strs; while (defined(my $line = <$fIn>)) { next if $line !~ /^set\szone\s (?:id\s\d+\s)? "([^"]*)"/x; my $zone = $1; chomp $line; print "Config line=> $.; Value=> $line; zone=> $zone\n"; }

    Prints:

    Config line=> 1; Value=> set zone "VLAN" vrouter "trust-vr"; zon +e=> VLAN Config line=> 3; Value=> set zone id 100 "Internet_Only"; zone=> + Internet_Only

    which retains a fairly specific match, but allows the variability you need. Notice that (?: ...) is used to provide grouping without capturing. Oh, and the x flag lets me use some white space in the regex so the moving parts are easier to see.

    True laziness is hard work

      Thanks everyone. This has been very helpful!

Re: Problem with alternating regex?
by jwkrahn (Monsignor) on Sep 11, 2012 at 03:38 UTC
    $ perl -le' for my $line ( q/set zone "VLAN" vrouter "trust-vr"/, q/set zone id 10 +0 "Internet_Only"/ ) { if ( $line =~ /^set\szone\s(?:"([^"]*)"|id\s\d+\s"([^"]*)")/ ) { my $zone = $^N; print $zone; } } ' VLAN Internet_Only

    Or:

    $ perl -le' for my $line ( q/set zone "VLAN" vrouter "trust-vr"/, q/set zone id 10 +0 "Internet_Only"/ ) { if ( $line =~ /^set\szone\s(?:id\s\d+\s|)"([^"]*)"/ ) { my $zone = $1; print $zone; } } ' VLAN Internet_Only
Re: Problem with alternating regex?
by kcott (Abbot) on Sep 11, 2012 at 05:19 UTC

    G'day dwlepage,

    Here's my take on a solution:

    #!/usr/bin/env perl use strict; use warnings; my $re = qr{ set \s zone \s (?> id \s \d+ \s | ) \" ( [^"]+ ) }x; my $out_format = "Config line=> %s; Value=> %s; zone=> %s\n"; while (<DATA>) { next unless /$re/; chomp; printf $out_format => $., $_, $1; } __DATA__ set zone "VLAN" vrouter "trust-vr" set zone id 100 "Internet_Only"

    Output:

    $ pm_pref_quote_regex.pl Config line=> 1; Value=> set zone "VLAN" vrouter "trust-vr"; zon +e=> VLAN Config line=> 2; Value=> set zone id 100 "Internet_Only"; zone=> + Internet_Only

    Note that I've used the (?> ... ) construct - documented in perlre - Extended Patterns. Use of this construct for alternations is a Perl Best Practices recommendation (which may, or may not, be important to you).

    I also added some additional lines to test for skipped (i.e. not matched) input and arbitrary surrounding text:

    __DATA__ set zone "VLAN" vrouter "trust-vr" set zone id 100 "Internet_Only" blah blah blah set zone "extra" whatever blah blah blah "set zone id 12345 "extra2_a" something "extra2_b"

    These tests were successful:

    $ pm_pref_quote_regex.pl Config line=> 1; Value=> set zone "VLAN" vrouter "trust-vr"; zon +e=> VLAN Config line=> 2; Value=> set zone id 100 "Internet_Only"; zone=> + Internet_Only Config line=> 4; Value=> blah blah set zone "extra" whatever; zo +ne=> extra Config line=> 5; Value=> blah blah blah "set zone id 12345 "extra2_ +a" something "extra2_b"; zone=> extra2_a

    You may be interested in Regexp::Debugger. This tool provides a visualisation of your regex in action. It is very easy to use: just add use Regexp::Debugger; near the start of your code and run your script.

    Another tool is YAPE::Regex::Explain. However, do be aware of its limitations: "There is no support for regular expression syntax added after Perl version 5.6, ...". Using this on your supplied regex produces the following (somewhat lengthy) output:

    -- Ken

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://992899]
Approved by Athanasius
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (7)
As of 2014-12-19 00:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (69 votes), past polls