Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re^2: Speeding up named capture buffer access

by SBECK (Pilgrim)
on Dec 01, 2009 at 15:13 UTC ( #810400=note: print w/ replies, xml ) Need Help??


in reply to Re: Speeding up named capture buffer access
in thread Speeding up named capture buffer access

Good news/bad news.

The number of calls didn't change at all (I didn't really expect it too).

Oddly enought though, the time required did decrease significantly, so I'll definitely switch to using hash slices. Probably some internal optimization that I wasn't aware of.

I'm still going to try to reduce the number of calls though... that's where the big speedup would come.


Comment on Re^2: Speeding up named capture buffer access
Re^3: Speeding up named capture buffer access
by BrowserUk (Pope) on Dec 01, 2009 at 15:36 UTC

    If you are going to (have to?) immediately assign the named captures to local/global variables (rather than using the named captures themselves), wouldn't you be better off avoiding the overhead of the ties completely by sticking with unnamed captures?

    I just can't see any advantage in:

    $string =~ $re; ($h,$mn,$s) = ($+{'h'},$+{'mn'},$+{'s'})

    Over (with unnamed captures):

    ($h,$mn,$s) = $string =~ $re;

    Just a not inconsiderable overhead.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      I can't do that for the exact reason that I want to use named capture buffers. The regular expression is very complicated. It's basically of the form:

           $re = qr/($re1|$re2|...|$reN)/;
      
      where each of the pieces may match a valid time. But some may match a partial time (perhaps only hours and minutes), some may match a 24-hour time and others may include an AM/PM string, some may include timezone information, and because there are so many ways to express times, some of them may even have the order of the fields changed, so I wouldn't want to depend on the order of the matches always being ($h,$mn,$s).

      So, using numbered matches, I could do something like:

           foreach $re ($re1,$re2,...) {
              ($h,$mn,$s) = $string =~ $re;
              last  if ($h,$mn,$s)
           }
      
      except that that won't work because I'm relying on the order of matches (and assuming that there will always be an $h match, etc).

      With named capture buffers, I can do this so elegantly. I define each regexp, name the capture buffers (in whatever order they come in) and the named buffer will contain all the ones that actually matched. Maintaining the complicated regexps in Date::Manip is about 100 times easier now!

        If the optimisation is that important, couldn't you just ‘pre-compile’ by doing the counting (laboriously and by hand, if necessary) once and taking into account the various possibilities? It's less elegant, but it seems that speed rather than elegance is your primary driver (with elegance a secondary bonus).
        $re = qr/(?<h1>...)(?<m1>...)(?<s1>...)|(?<h2>...)(?<m2>...)(?<s2>...) +/; $string =~ $re; ( $h, $m, $s ) = ( $1 || $4, $2 || $5, $3 || $6 );

        OKay, I can see what is driving your requirements. One possibility that might prove a little quicker is Alternative-capture-group-numbering*, which allows you to re-use capture numbering within different match alternatives.

        The example given at the reference above is very pertinent to your use. It might at least be worth benchmarking.

        *Unfortunately #anchors no longer seem to work at perldoc since they added that annoying moving menu :(


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re^3: Speeding up named capture buffer access
by vitoco (Friar) on Dec 01, 2009 at 15:41 UTC

    I don't see anything wrong with:

    %tmp = %+;

    Did you try/get the number of calls of this?

    BTW, I first thought about the following as in update/append mode for hashes:

    @tmp{ keys %+ } = values %+;

    but I could predict that the number of calls should increase.

      There's not anything 'wrong' with it (i.e. it works), but now, in addition to 3 calls to FETCH for every key, there are also calls to FIRSTKEY and NEXTKEY, so the number of calls increases, and it's marginally slower.

      Basically, if there are 3 named buffers (and in the real-life module, there's usually a lot more than that), there have to be 3 calls to FETCH, but by doing the work at the level of my module, it is constantly doing a FETCH and returning to my module, then calling FETCH again, over and over.

      I want to have a Tie::Hash method which will return the entire hash. Then there will be only a single call to a Tie::Hash::NamedCapture routine (which will then call all of the FETCH'es internally, but since this is all c code, it should be a lot faster).

      Unfortunately, I'm fairly certain it doesn't exist at this point, so I'm just experimenting with ways to speed up what I've got.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://810400]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (7)
As of 2014-11-23 05:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (128 votes), past polls