Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re: Capturing brackets within a repeat group [plus dynamic backreferences]

by ihb (Deacon)
on Jan 11, 2003 at 11:30 UTC ( [id://226073]=note: print w/replies, xml ) Need Help??


in reply to Capturing brackets within a repeat group

The typical expression to illustrate this is   /(.)*/s That will match last char in the string, if any. If you step back from the screen and look at the pattern again, you might think this makes sense. Looking at the capturing part (.) I think you want $1 to be one char long.

Expanding the issue a bit, would you want   'abcd' =~ /(?:(.)(.))*/s or   'abcd' =~ /(?:(.){2})*/s to set
$1 eq 'a' $2 eq 'b' $3 eq 'c' $4 eq 'd'
?

What potentially could get really messy would be if you have another group and the end:   'abcdx' =~ /(?:(.)(.))*(.)/s How would you easily know what the last match matched? (Ignoring Re: Multiple matches of a regex.) Sure, you can use $+, or even $^N in recent perls. But what if it's the second last match?

This also leads the question to how you'd do backreferences, if you at regex compile-time can't decide which variable that will hold the submatch.

But this being Perl you of course can do what you want. Here's a little demonstration where I want to match subsequent words with nothing but spaces in between:
$_ = 'foo bar baz burk | gah'; my @words; /(?:(\w+)\s+(?{push @words => $1}))*/; # Not backtracking safe! See +below. # Submatches are in @words now.
If we look back at the issue of backreferencing you can use (??{}) to create dynamic backreferences. This pattern below requires the last two words to be identical (but it doesn't include the last word in @words; compare to /(.)\1/).
my @words; 'foo bar baz baz burk | gah' =~ / (?{ local @_words }) (?: (\w+) \s+ (?{ local @_words = (@_words, $1) }) )+ (??{ quotemeta $_words[-1] }) (?{ @words = @_words }) /x;
This version is also backtracking safe. The one above wasn't, but it didn't need to. As you see it's a bit of extra work to make it backtracking safe so I kept it simple in the one that didn't need it.

Hope I've helped,
ihb

Replies are listed 'Best First'.
Re: Re: Capturing brackets within a repeat group [plus dynamic backreferences]
by BrowserUk (Patriarch) on Jan 11, 2003 at 22:18 UTC

    Hope I've helped,

    In truth, I think you missed the point entirely. :^)


    Examine what is said, not who speaks.

    The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.

      Besides looking for documentation I thought you'd perhaps wanted an explanation why Perl's current behaviour is sane and to be expected, and given a way to do achieve what you thought Perl would do for you. Let me set my general reply in context of MAC address parsing:

      First things first though:
      local $_ = join ':', qw/0 0A 0C B B8 F/; # $mac my $part = qr/[0-9A-Z]{1,2}/;
      First you used   my @parts = /^($part):($part):($part):($part):($part):($part)$/; which worked. Then you tried to shrink it to
      my @parts = / ^ (?: ($part) : ){5} ($part) $ /x;
      but that didn't work. Now, using "my" technique you just need to add three to four lines to achieve what you want.
        use re 'eval'; # Needed due to interpolation of $part
        my @parts;
        /
          (?{ local @_parts })
      
          ^
          (?:
            ($part)
            :
            (?{ local @_parts = (@_parts, $1) })
          ){5}
          ($part)
          $
      
          (?{ @parts = (@_parts, $2) })
        /x;
      
      The beauty of this technique is that you don't have to know how many times you need/want to match; something that is required if you use the x operator.

      If you just want to solve this particular problem, why not simply verify with your second more compact regex and then split it up on /:/?

      Update:
      Since I got negative response on this reply I reworded the beginning to make it better express what I meant. If it sounded offensive or bad in any way then that wasn't how it was meant and I apologize.

      ihb

        First. It wasn't me that gave you a negative response, and whilst I didn't see what you said originally, I doubt I would have been offended. My rather terse reply (with smiley) to your original post was simply that I read that post several times and missed the relevance. This post clarifies your intent nicely, thankyou.

        Now to the contents of this post:).

        beauty of this technique is that you don't have to know how many times you need/want to match;

        This is where I felt you missed the point of my original post in as much as, not only do I know exactly how many parts I'm trying to capture, I only want to capture if there are exactly that number of parts to be captured. Hence the choice of using an exact repeat count {5}.

        ... why not simply verify with your second more compact regex and then split it up on /:/?

        Ah! Now that does offend me:^) Or rather, it offends my sense of efficiency. Using a regex to verify, and then split to extract the parts means parsing the string twice which seems wasteful when it can be done in a single pass. As has been liberally discussed elsewhere, this is almost certainly a micro-optimisation which in the big scheme of things in any given answer here, is hardly worth the effort, but...

        I (contrary to popular opinion), use my time and efforts here at the monastery as a learning experience. That is to say, whilst I sincerely hope that any answers I provide assist the OP to whom I provide them, much more significant from my personal perspective is that every single question I have a crack at means that I learn, re-learn or re-enforce some aspect of my knowledge of Perl. And one of the things that I try to learn whilst attempting to answer each question is "Is there a better way of doing it.".

        Now the definition of "better" can vary. Often this can mean 'clearer' or 'simpler'. Aristotle has an uncanny knack of taking a peice of my oft tortuous code and simplifying it using idiomatic perl and rendering a much simpler, clearer solution. sauoq invariably sees through any quick & dirty regexes and provides graphic demostrations of my bad assumptions. Too many others to mention have contributed to my learning with similar demonstrations of skill and ingenuity.

        One of my personal favorite definition of "better", is 'more efficient'. Whilst the increased efficiencies shown in short snippets generated as answers to specific SoPW's are often of little consequence, by learning what techniques are more efficient at this level, I hope that as my own projects get more complex, I will be better equiped to write efficent code at levels where it becomes significant.

        As an example (with no disrispect to the author intended), I recently attempted to optimise a peice of code that made liberal use of Math::Round. This module has one piece of code that is particularly clever--the mechanism of determining the smallest value greater than 0.5 that perls floating point representation can support on any given platform. This is apparently--and I am not sufficiently offay with the vaguaries of floating point math and FP processors to argue--quite important to the process of accurate rounding.

        If you take a quick look at the code in this module, you may notice several places where it could be easily made slightly more efficient, but nothing in particular stands out as demonstrably inefficient. However, in the context of a graphical application that makes heavy use of the functions in that module whilst processing 2- and 3-dimensional arrays of floating point values representing 2d and 3d coordinate vectors, at the inner levels of loops nested 2 and 3 deep, each of those small inefficiencies mount up. In this particular case, dramatically so.

        So, whilst in the context of the module, intuatively writing efficient code may seem unnecessary, in the wider context of the applications that use the inefficient code that results from not knowing better, can have a dramatic effect on the overall performance and useability of the final applications that use it. This effect is multiplied if the inefficient code is itself a dependancy of other modules that themselves are written without consideration for efficiency, especially if one or more of those levels makes use of perl's OO (or tie) facilities which are themselves fairly costly. Of course there is the argument that if efficiency is a high criteria for your application, then you should probably use a different language, but I eshew this on the basis that the loss of the convenience and increased development time that results from moving to using C, C++ or Java, far outweighs the benefits. Especially as in many (though not all) cases, a little knowledge or experimentation to find the most efficient of the MWTDI, can mean that the performance acheived using perl is adaquate.

        To this end, if I think I see a more efficient method of acheiving any particular goal in perl, I tend to explore it. And if it proves to be more efficient, and doesn't require too much sacrifice in terms of clarity, brevity or simplicity, then I tend to prefer that method over any other when writing similar code on the basis that one day that code may be called by other code from within a loop or recursive process such that the efficiency will become important.

        Out of interest, I realised that I have been here before and have made use of your push-to-array-from-regex-nested-code-block technique (to coin a name:). See Efficient run determination. for the nitty-gritty.


        Examine what is said, not who speaks.

        The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://226073]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (4)
As of 2024-04-20 01:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found