http://www.perlmonks.org?node_id=523792

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I'm wondering why my regex only matches the first half of my expression and ignores my 'or' using the following line:
if (/^ifSpeed.(\d+)|^1.3.6.1.2.1.2.2.1.5.(\d+)/){ print "Index: $1\n"; }
If 'ifSpeed' is seen then the index is found but $2 is set if the second format is seen.
I'm obviously missing something here.
Thanks

Replies are listed 'Best First'.
Re: regex question/mystery
by brian_d_foy (Abbot) on Jan 17, 2006 at 18:39 UTC

    Do you mean that $1 is set and $2 isn't if you find ifSpeed, but it's the other way around for the 1.3.6...?

    To make things simple, Perl assigns the memory variables based on the order of the opening parentheses. You don't have to worry about match order or nesting that way.

    Perhaps you wanted this regular expression that only has one thing to remember:

    /^(?:ifSpeed.|1.3.6.1.2.1.2.2.1.5.)(\d+)/

    The first group of parentheses uses ?: to tell Perl they are just for grouping (so no memory variable). That way, the alternation is a single unit and the stuff that comes after either prefix shows up in $1.

    --
    brian d foy <brian@stonehenge.com>
    Subscribe to The Perl Review
      Thats exactly what I needed, and thanks for the explanation.

      I'm always learning!
      Thanks again!

Re: regex question/mystery
by Tanktalus (Canon) on Jan 17, 2006 at 18:41 UTC

    While it's generally handy if you show some example input, in this case your question is "why is $2 set?" which doesn't need sample input. The answer is because it's the second parenthesised value in the regexp. Perl's REs never compress the list of found values in case knowing which one is which is important. In this case, a simplification will get you what you want:

    if (/^(?:ifSpeed.|1.3.6.1.2.1.2.2.1.5.)(\d+)/){
    This way, there is only one set of capturing parens. The first set of parens has the ?: modifier which says "this is for grouping only, not for capturing."

Re: regex question/mystery
by imagestrips (Initiate) on Jan 17, 2006 at 18:44 UTC
    Hello, although i might have not got the answer, In the regex provided the dots will much any character therefore allowing for errors. try escaping the dots with a backslash like \.. H.
Re: regex question/mystery
by blazar (Canon) on Jan 17, 2006 at 18:40 UTC

    I'm not really sure if I understand what you mean. Of course if "the first half of your expression" matches, then the second one won't. Do you really need that alternation? Wouldn't you better split it in two separate regexen? Alternatively, isn't it that you really want

    /^(?:$begin1|$begin2)(\d+)/

    instead?

    Also, you seem to be familiar with regexen so that I may well be wrong, but your use of dots is somewhat suspect, thus I dare to ask... are you aware that "." matches "any charachter"?

Re: regex question/mystery
by philcrow (Priest) on Jan 17, 2006 at 18:37 UTC
    Update: the answer formerly here was wrong. Sorry.

    Phil

      Your conclusion is wrong. Alternation has very low precedence, and binds loosely. In the following expression, alternation provides two alternatives, the complete expression on the left, or the complete expression on the right:

      m/fast\s(break)|(break)fast/

      The string matched by that RE must contain either "fast break" or "breakfast" (or both, but it wouldn't matter). In either case, 'break' is captured, but $& tells the rest of the story. Witness the following code:

      use strict; use warnings; my $string = "breakfast break"; if( $string =~ m/fast\s(break)|(break)fast/ ) { print "\$1 contains ", defined( $1 ) ? $1 : "undef", "\n"; print "\$2 contains ", defined( $2 ) ? $2 : "undef", "\n"; print "The portion of the string that matched was $&\n"; } __OUTPUT__ $1 contains undef $2 contains break The portion of the string that matched was breakfast

      The alternation is constrained on each side only be the / (the beginning and end of the RE, not by (break). That being the case, there is no need to have introduced additional parenthesis in the OP's regex. In fact, you have now changed the outcome of his RE in another way; to get at the data he intended to capture, he now must look at $2 or $4.


      Dave