Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

regex question/mystery

by Anonymous Monk
on Jan 17, 2006 at 18:29 UTC ( #523792=perlquestion: print w/replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I'm wondering why my regex only matches the first half of my expression and ignores my 'or' using the following line:
if (/^ifSpeed.(\d+)|^1.3.6.1.2.1.2.2.1.5.(\d+)/){ print "Index: $1\n"; }
If 'ifSpeed' is seen then the index is found but $2 is set if the second format is seen.
I'm obviously missing something here.
Thanks

Replies are listed 'Best First'.
Re: regex question/mystery
by brian_d_foy (Abbot) on Jan 17, 2006 at 18:39 UTC

    Do you mean that $1 is set and $2 isn't if you find ifSpeed, but it's the other way around for the 1.3.6...?

    To make things simple, Perl assigns the memory variables based on the order of the opening parentheses. You don't have to worry about match order or nesting that way.

    Perhaps you wanted this regular expression that only has one thing to remember:

    /^(?:ifSpeed.|1.3.6.1.2.1.2.2.1.5.)(\d+)/

    The first group of parentheses uses ?: to tell Perl they are just for grouping (so no memory variable). That way, the alternation is a single unit and the stuff that comes after either prefix shows up in $1.

    --
    brian d foy <brian@stonehenge.com>
    Subscribe to The Perl Review
      Thats exactly what I needed, and thanks for the explanation.

      I'm always learning!
      Thanks again!

Re: regex question/mystery
by Tanktalus (Canon) on Jan 17, 2006 at 18:41 UTC

    While it's generally handy if you show some example input, in this case your question is "why is $2 set?" which doesn't need sample input. The answer is because it's the second parenthesised value in the regexp. Perl's REs never compress the list of found values in case knowing which one is which is important. In this case, a simplification will get you what you want:

    if (/^(?:ifSpeed.|1.3.6.1.2.1.2.2.1.5.)(\d+)/){
    This way, there is only one set of capturing parens. The first set of parens has the ?: modifier which says "this is for grouping only, not for capturing."

Re: regex question/mystery
by blazar (Canon) on Jan 17, 2006 at 18:40 UTC

    I'm not really sure if I understand what you mean. Of course if "the first half of your expression" matches, then the second one won't. Do you really need that alternation? Wouldn't you better split it in two separate regexen? Alternatively, isn't it that you really want

    /^(?:$begin1|$begin2)(\d+)/

    instead?

    Also, you seem to be familiar with regexen so that I may well be wrong, but your use of dots is somewhat suspect, thus I dare to ask... are you aware that "." matches "any charachter"?

Re: regex question/mystery
by imagestrips (Initiate) on Jan 17, 2006 at 18:44 UTC
    Hello, although i might have not got the answer, In the regex provided the dots will much any character therefore allowing for errors. try escaping the dots with a backslash like \.. H.
Re: regex question/mystery
by philcrow (Priest) on Jan 17, 2006 at 18:37 UTC
    Update: the answer formerly here was wrong. Sorry.

    Phil

      Your conclusion is wrong. Alternation has very low precedence, and binds loosely. In the following expression, alternation provides two alternatives, the complete expression on the left, or the complete expression on the right:

      m/fast\s(break)|(break)fast/

      The string matched by that RE must contain either "fast break" or "breakfast" (or both, but it wouldn't matter). In either case, 'break' is captured, but $& tells the rest of the story. Witness the following code:

      use strict; use warnings; my $string = "breakfast break"; if( $string =~ m/fast\s(break)|(break)fast/ ) { print "\$1 contains ", defined( $1 ) ? $1 : "undef", "\n"; print "\$2 contains ", defined( $2 ) ? $2 : "undef", "\n"; print "The portion of the string that matched was $&\n"; } __OUTPUT__ $1 contains undef $2 contains break The portion of the string that matched was breakfast

      The alternation is constrained on each side only be the / (the beginning and end of the RE, not by (break). That being the case, there is no need to have introduced additional parenthesis in the OP's regex. In fact, you have now changed the outcome of his RE in another way; to get at the data he intended to capture, he now must look at $2 or $4.


      Dave

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://523792]
Approved by Corion
help
Chatterbox?
[ambrus]: Corion: two very hard things about presentations I should try to work on if I have twenty times as much free time as in real life are:
[Corion]: That's why I like HTML - it makes it relatively easy to resize stuff. Resizing with Powerpoint is much harder, or at least, I remember it being that way
[ambrus]: (a) good sans serif fonts optimized for slides in a projector with coverage of the symbols needed for mathematical formulas in a sans serif font matching the text font well, and
[ambrus]: (b) a good presentation system that lets the presenter quickly interactively edit the slides live during a presentation, to combine the advantages of blackboard and overhead slide styles in modern tech
[Corion]: Heh - in university, I cheated on (a) by doing blackboard presentations using chalk. But those were 2 hour presentations, not quick/essential/ reduced presentations where you want to show something quick
[ambrus]: (either on just one screen or two screens). this is necessary because
[ambrus]: overhead slide plus blackboard is inconvenient because the lighting conditions are different and they require separate areas you can't quickly repartition, and typing on keyboard is faster and more convenient than writing on a blackboard
[Corion]: (b) would be cool. I've thought about this doing Pod editing, and even simply regenerating/live updating the browser makes things much more interactive
[ambrus]: modern computers have way enough processing power to allow this, at least for geeks who are willing to spend a few weeks to learn a tricky new user interface like vim
[Corion]: ambrus: Well, for mathematical notation, I find blackboard much more convenient than a computer. But when inserting text or moving text around, the computer wins obviously

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (11)
As of 2017-09-26 10:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    During the recent solar eclipse, I:









    Results (293 votes). Check out past polls.

    Notices?