Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Efficiency Question

by mr.nick (Chaplain)
on Feb 23, 2001 at 19:11 UTC ( [id://60488]=perlquestion: print w/replies, xml ) Need Help??

mr.nick has asked for the wisdom of the Perl Monks concerning the following question:

Alright, so I'm reinventing the wheel with something and I want everyone's opinion. I'm writing a simple circular cache to use inside a module of mine. This cache is specific to the module and I want it to be as fast as possible.

Here's the code so far:

package GNS::Cache; use strict; ########### ## How this for strangeness ... ########### sub new { my $class=shift; my $size=shift; my $self=bless { }; $self->{size}=($size ? $size : 30); $self->{list}=[]; $self; } sub check { my $self=shift; my $id=shift; local $_; ## walk through the list foreach (@{$self->{list}}) { if ($_->[0]==$id) { return $_->[1]; } } return; } sub insert { my $self=shift; my $id=shift; my $data=shift; my $list=$self->{list}; push @$list,[$id,$data]; shift @$list if $#$list>=$self->{size}; } sub purge { my $self=shift; $self->{list}=[]; } sub dump { my $self=shift; local $_; foreach (@{$self->{list}}) { print "$_->[0] = $_->[1]\n"; } } ####### 1; __END__ #######
#1. In the operations that you see, is there anyplace where a code change could be applied that would make it faster?

#2. In the check() function, if a hit is found, I would like the item move to the top of the list (representing it should stay in the cache longer). What I need to do is remove the current item ($_ as it stands now) and tack it on to the end of the list via a push. How would you recommend doing it? I dont care about cleverness, or obsfucation, but raw number of instructions. I had though of doing an undef $_; inside the loop, but that doesn't seem to actually remove the item from the array, just undef it out (the list continues to have the same # of elements). I suspect I need to splice it, but without having the index of the element of the array, I couldn't figure out how to do it... efficiently.

#3. In my purge() function, I do a simple $self->{list}=[] to reset the list. I presume Perl's memory management is good enough to handle that style of deallocation. Correct?

There are a bazillion methods of doing this, and I've tried a few, but I'm curious as to everyone else's ideas.

Thanx! and forgive me if this is not well written, I'm just on my first cup of java.

Thanx!

Replies are listed 'Best First'.
Re: Efficiency Question
by fundflow (Chaplain) on Feb 23, 2001 at 19:18 UTC
Re: Efficiency Question
by clintp (Curate) on Feb 23, 2001 at 19:33 UTC
    #2. In the check() function, if a hit is found, I would like the item move to the top of the list (representing it should stay in the cache longer). What I need to do is remove the current item ($_ as it stands now) and tack it on to the end of the list via a push. How would you recommend doing it? I dont care about cleverness, or obsfucation, but raw number of instructions. I had though of doing an undef $_; inside the loop, but that doesn't seem to actually remove the item from the array, just undef it out (the list continues to have the same # of elements). I suspect I need to splice it, but without having the index of the element of the array, I couldn't figure out how to do it... efficiently.
    You hit the nail on the head. Setting aside efficiency (I have a whole rant prepared for that) trying to splice up an array while iterating over it with for(LIST) is tough business. The easiest thing to do is to attack it with a for(init; test; increment) loop:
    sub remove5s { my($aref)=@_; for(my $i=0; $i<@$aref; $i++) { $_=$$aref[$i]; unless ($_ % 5) { splice(@$aref, $i, 1); $i--; } } }
    This sub removes elements that are evenly divisible by 5. Rather than splice them out, it could just as easily have pushed them onto the end of the list (but you didn't say which list) or swapped it with the element at the end (watch for terminal conditions!).
      Without much research here, wouldn't an AVL tree do what you would like to do? It is optimized (I believe; I hope I'm remembering the correct algorithm) for this type of thing; most often used items are moved to the top, and it is self-balancing too (I believe).
Re: Efficiency Question
by mirod (Canon) on Feb 23, 2001 at 19:25 UTC

    My first reflex if you use a list with ids would be to use a hash instead, the keys being the id and the values the data. To purge an item just do a delete $cache->{$id}.

    If you want to add a time stamp or something so that you can purge the cache every now and then you then either need another hash (id => weight, time or whatever criteria you will use) or store an array [criteria, data] in each value.

      I first had it coded as you suggested, but two thoughts occured to me:
      1. Keeping in inside an array would be faster to walk (foreach (@arr) has to be faster than doing 30 hash lookups).
      2. An order (and count) needed to be represented (for maintainence purposes) anyway, so why not just keep it in an array ref in the first place. I thought of the time stamp, but looking it up and testing for it would consume MUCH more time than a simple FILO algo. Remember, I'm trying my damnest to minimize the work the cache has to do.
      Heck, I even tried blessing an array ref instead of a hash hoping I could make a class from that; but of course that didn't work :)

      Now, here is an addition question. Is

      $class->method();
      any slower than
      Class::method();
      ? And by "slower", I mean technically, not noticably :)

      TIA!

        The general wisdom says that yes, method calls are slower than function calls. (In some cases DRAMATICALLY slower.) For a simple case though, lets find out:
        use Benchmark; package Foo; sub bar { 1; } package main; timethese(1000000, { method => sub { Foo->bar(); }, func => sub { Foo::bar(); }, } );
        Drumroll....
        Benchmark: timing 1000000 iterations of func, method...
        func: 1 wallclock secs ( 1.93 usr + 0.00 sys = 1.93 CPU) @ 518134.72/s (n=1000000)
        method: 6 wallclock secs ( 4.95 usr + 0.00 sys = 4.95 CPU) @ 202020.20/s (n=1000000)
        Yup! When in doubt, Benchmark.
        Yes, 30 hash lookups is slower than walking an array with foreach.

        OTOH one hash lookup is much faster than walking a good chunk of an array.

        If you can arrange that straight lookups happen more often than having to walk the whole cache, then you should get a performance win. (See code in my other node.)

        You should also consider that method invocation on a blessed reference is much faster than method invocation on class name.

        There is rather little performance overhead on object method calls. In addition, the two calls you suggest are not really an apples-apples comparison, since the class method passes the class name as a param, and the direct invocation does not (if factored in, the direct invocation becomes about 20% slower than shown below).

        package Foo; use strict; use warnings; use Benchmark qw(cmpthese); sub new { bless {}, shift; } sub meth { 1 } my $obj = new Foo; cmpthese (-3, { class => sub { Foo->meth($obj) }, method => sub { $obj->meth() }, direct => sub { Foo::meth() }, }); ### results Rate class method direct class 464126/s -- -50% -64% method 923914/s 99% -- -28% direct 1289819/s 178% 40% --
           MeowChow                                   
                       s aamecha.s a..a\u$&owag.print
Re (tilly) 1: Efficiency Question
by tilly (Archbishop) on Feb 23, 2001 at 22:55 UTC
    You said up front that you are reinventing the wheel. However I am kind of curious, why did Memoize not work for you, and did you look at Memoize::Expire?

    Another thing to look at is Tie::Cache.

      That's a really interesting module; never seen it before.

      The reason I'm doing this by hand is to reduce overhead, both in execution and loading. All the prefabbed modules that are designed to do something like this are simply too complex for my needs. Which means writing something that is very specific to the parent.

      I'm just looking for the simpliest, most efficient solution to my question.

      This is more of an infatuation that an actual need.


      Let me put it this way: without using any modules, what is the fastest and smallest caching system you can come up with that stores and recalls blocks of data indexed by a Unique ID. The data is passed as a scalar. The cache must have a maximum size of 30 elements and be able to order itself according to hit frequency.
        Fastest and smallest are incompatible.

        Also do you need at least 30, or exactly?

        If I wanted to be bare bones I would do something like this untested code:

        use strict; use vars qw($max_size $reset_size); $max_size = 60; $reset_size = 30; { my %count; my %cache; sub get_elem { my $id = shift; if (exists $count{$id}) { ++$count{$id}; return $cache{$id}; } else{ return; } } my $size; sub set_elem { { my $id = shift; my $elem = shift; if (exists $count{$id}) { $cache{$id} = $elem; } else { if ($max_size < ++$size) { _clear_cache(); } $count{$id} = 0; $cache{$id} = $elem; } redo if @_; } } sub _clear_cache { my @ids = sort {$count{$b} <=> $count{$a}} keys %count; @ids = @ids[0..($reset_size-1)]; my %new; %count = (); foreach (@ids) { $new{$_} = $cache{$_}; $count{$_} = 0; } %cache = %new; $size = $reset_size; } }
        Note that I go over 30 here, but doing so makes the actual lookup much faster.
Re: Efficiency Question
by jeroenes (Priest) on Feb 23, 2001 at 23:14 UTC
    Upon reading this, I have two thoughts. One, Maybe this is Yet Another Application for the BerkeleyDB. I really feel like I'm giving the same answer to every other Seeker, but Berkeley has a BTree lookup, which is really efficient, and a configurable cache, and it is written in C. Maybe the interface has too much overhead (I don't know), and it also depends on the number of items you want to cache. I mean, for 30 items, it's hardly worth the DB overhead, probably. But maybe it's already something different at 300 items.

    Number two is: C. If you really want to save on the number of instructions, you'd better implement the storage and fetching in C. You can do much more educated guesses about the nature of your items than perl can. But it really depends on your data. Variable length or fixed? Number or text? And of course, interfacing with arrays is cumbersome and instruction-intensive. However, if you keep your cache packed in perl, and as a pointer in C, you will circumvene that.

    If you can give some background on the nature of your data, I can have a try at some C code. Would be fun, methinks.

    Hope this helps,

    Jeroen
    "We are not alone"(FZ)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://60488]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (4)
As of 2025-06-24 00:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.