Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: Adventures in optimization (or: the good and bad side of being truly bored)

by Limbic~Region (Chancellor)
on Aug 02, 2003 at 16:12 UTC ( #280295=note: print w/replies, xml ) Need Help??


in reply to Adventures in optimization (or: the good and bad side of being truly bored)

revdiablo,
I am a little confused. You say that Time::Local wasn't fast enough for you, so you did it yourself by adding caching. That's exactly what Time::Local does:

These routines are quite efficient and yet are always guaranteed to agree with localtime() and gmtime(). We manage this by caching the start times of any months we've seen before. If we know the start time of the month, we can always calculate any time within the month. The start times are calculated using a mathematical formula.

So the only thing I could see that would make it slower than your method is the fact that it validates the input before acting on the data, which you can turn off using the timelocal_nocheck option and of course the fact that methods are always slower than subs which are always slower than inline code.

Ok - now that little nit is over, it is almost always turns out that to improve the speed of a giving piece of code to add caching (if possible) or change the algorithm.

As far as optimizing a piece of code being fun and interesting - I quite agree. You should be using Devel::Profile to determine exactly what piece of the code is chewing up time and focus there as it will often not be where you think the problem is. You should also always Benchmark on an idle system. Your test data should be thoroughly varied as "real" data can often swing your results in the other direction. By that I mean that comparing index to a regex might be slower for strings up to 30 characters, but then it may take the lead and ultimately demolish the regex. If you haven't taken that into account in your test data, you will still end up with poorly optimized code.

Optimized code that isn't the result of a new algorithm or caching is almost always harder to maintain. I have such a piece of code that I not very proud of even though it gets the job done. It is better to learn the better algorithms than make your code unreadable. If you absolutely need that much speed and can't get it from better algorithms/caching - maybe it is time to learn a new language.

Cheers - L~R

  • Comment on Re: Adventures in optimization (or: the good and bad side of being truly bored)

Replies are listed 'Best First'.
Re: Re: Adventures in optimization (or: the good and bad side of being truly bored)
by diotalevi (Canon) on Aug 02, 2003 at 16:25 UTC

    And to add caching - Memoize!

    As for actual profiling, its really easy to run your scripts under Devel::DProf (which comes with perl don'cha know) `perl5 -d:DProf test.pl` and after its finished, dprofpp. Sooper easy. I worked out an order of magnitude increase in some perl code speed by repeatedly testing my changes and assumptions against profiled code. And for those that are curious - it turns out that regex capturing can be a real dog some times.

      diotalevi,
      Your attention to detail never ceases to amaze me. I didn't want to mention Memoize at the beginning since it would not work for what revdiablo was trying to do, but I did want to mention it later as a "generally you can add caching simply by using Memoize by Dominus" and completely forgot.

      Thanks - L~R

      Indeed. Devel::DProf is good stuff. I was very happy with my results when using it and Benchmark to, erm, profile and benchmark my code.

      As for Memoize, Limbic~Region's reply is correct. It won't really work for me in this case, because I willshould never have the case where my subroutine gets called twice with the same arguments (unless of course there's something wrong in my log and I get duplicate entries). Thanks for mentioning this though, I've been meaning to look at it for a good while now.

        I just mentioned Memoize for the general case and the rest of our audience.

      Regarding Memoize, maybe im weird but i've never found it to be particularly useful. I have never had a real (dev and toy, ok) circumstance where I can cache based on arguments alone, and splitting my function into subfuctions that can be so Memoized never seemed to me to be the right thing to do. I think caching properly requires some degree of thought and understanding of the overall process, which combined with the ease of establishing a cache in Perl I have never used Memoize in any production code.


      ---
      demerphq

      <Elian> And I do take a kind of perverse pleasure in having an OO assembly language...
Re: Re: Adventures in optimization (or: the good and bad side of being truly bored)
by revdiablo (Prior) on Aug 02, 2003 at 19:40 UTC

    Thanks for the reply Limbic~Region. I didn't know about the caching built into Time::Local, and I will definitely look into it. Please note that I do still use Time::Local, but I only call it once per each day found in my log, rather than tens of thousands of times per day.

    Also, though I didn't mention it in my original post, I did make extensive use of Devel::DProf. As mentioned later in a reply from diotalevi, it really is easy to use, and the results are quite useful. I highly recommend it for anybody interested in improving the speed of their code, and probably should have said something about it in my original post. Between this and Benchmark, determining where to speed up one's program is all but simple. :)

    I totally agree with the rest of your post too. My first optimization-related SoPW brought to my attention the problem using test data that is not exactly representative -- though it wasn't a huge issue there. And in the case of both subroutines, a new algorithm combined with caching was indeed the way I gained the huge performance increases. Again thanks for the reply.

    Update: on further investigation, I'm not sure how useful Time::Local's caching will be in my situation. It caches at the month level, whereas I'm caching at the day level. Since my logs span only about a week of time, and month-level caching would still result in 10s of thousands more full timelocal calls than I currently do, I think it won't help me too much. (Note: this is all untested speculation. Please be advised to destroy my assumptions at will.)

      on further investigation... 10s of thousands more .. calls

      Are you sure? I thought the cache worked like this

      $cache{$yearmonth}+$days*(24*60*60)+$hours*(60*60)+$mins*60+$secs;

      Having said that, I'm in the odd position that I too didn't realize the nocheck option in Time::Local, and I also wrote my own caching for it, but I did it based on hour. I am parsing dates like "20030401101123" and i endup doing something like the following (time_to_unix is just a wrapper around timelocal that knows how to split the above string (fragment) correctly)

      ($cache{substr($date,0,10)}||=time_to_unix(substr($date,0,10))) + substr($date,10,2)*60 + substr($date,12,2);

      which also gave me a several thousandfold time increase in my time calculations. Incidentally I think this approach will probably signifigantly outperform using timelocal() (and its caching) directly. The hash lookup on the first section of the date is far cheaper than splitting the date and then passing all of its parts on the stack, having timelocal do its checks and caching, which presumably resemble

      $cache{$year.$month}

      anyway, and then getting the results back over the stack. We trade many ops for just a few. And we get a cool bonus, since Time::Local is still validating its input the cache actually acts as a validating filter too. Only valid YMDH's get into it, and if we dont have a hit we either have an unknown valid YMDH or a bad date. Both of which Time::Local handles for us. So we get a serious speed benefit without losing any of the safety of Time::Local.


      ---
      demerphq

      <Elian> And I do take a kind of perverse pleasure in having an OO assembly language...

        demerphq++. Thanks for the reply. I did subsequently benchmark Time::Local with _nocheck, and while it was faster than without _nocheck, my home-brew cache was still substantially faster. Interesting that you decided to cache at the hour level, rather than the day. I chose the day level because converting hours to seconds is a relatively trivial calculation, but then again I guess converting days to seconds is too, so maybe caching at the month level would be just as good.

        Now I wonder if the different caching level is the reason _nocheck is slower. Perhaps it's due to the additional subroutine call, and not the different caching at all. But again this is all rank speculation... (I'm actively resisting the urge to break out my benchmark.pl and test hour, day, and month-level caching, but I think I need to just be happy with the performance I've got.)

        PS: Based on your reply here and to my post about moving averages, I have to wonder if you're not doing something relatively similar? Hopefully my posts have been somewhat helpful to you, but more likely it seems that your posts have been more helpful to me. ;)

        Update: Just thought I might clarify a bit:

        on further investigation... 10s of thousands more .. calls

        Are you sure? I thought the cache worked like this ...

        I meant 10s of thousands more calls to timelocal. Your example is essentially how my cache works (though there are a few things I notice that would probably make it a touch quicker than mine). My log has 10s of thousands of entries between each unique day (an entry every 5 seconds, to be precise), so using Perl's math operations instead of a call to timelocal for all those entries is a huge win.

      Actually... I wouldn't normally think of Benchmark unless I'm considering altering my perl style. It isn't going to help you find the slow parts in your program, isn't even going to tell you whether the speed difference is even meaningful. I guess the only time I ever actually reach for it is when I'm doing very odd things and want to know which odd thing performs less worse. Outside of that... *shrug*.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://280295]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (6)
As of 2019-11-22 17:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Strict and warnings: which comes first?



    Results (113 votes). Check out past polls.

    Notices?