mr.nick has asked for the wisdom of the Perl Monks concerning the following question:
Alright, so I'm reinventing the wheel with something and I want everyone's opinion. I'm writing a simple circular cache to use inside a module of mine. This cache is specific to the module and I want it to be as fast as possible.
Here's the code so far:
package GNS::Cache;
use strict;
###########
## How this for strangeness ...
###########
sub new {
my $class=shift;
my $size=shift;
my $self=bless { };
$self->{size}=($size ? $size : 30);
$self->{list}=[];
$self;
}
sub check {
my $self=shift;
my $id=shift;
local $_;
## walk through the list
foreach (@{$self->{list}}) {
if ($_->[0]==$id) {
return $_->[1];
}
}
return;
}
sub insert {
my $self=shift;
my $id=shift;
my $data=shift;
my $list=$self->{list};
push @$list,[$id,$data];
shift @$list if $#$list>=$self->{size};
}
sub purge {
my $self=shift;
$self->{list}=[];
}
sub dump {
my $self=shift;
local $_;
foreach (@{$self->{list}}) {
print "$_->[0] = $_->[1]\n";
}
}
#######
1;
__END__
#######
#1. In the operations that you see, is there anyplace where a code change could be applied that would make it faster?
#2. In the check() function, if a hit is found, I would like the item move to the top of the list (representing it should stay in the cache longer). What I need to do is remove the current item ($_ as it stands now) and tack it on to the end of the list via a push. How would you recommend doing it? I dont care about cleverness, or obsfucation, but raw number of instructions. I had though of doing an undef $_; inside the loop, but that doesn't seem to actually remove the item from the array, just undef it out (the list continues to have the same # of elements). I suspect I need to splice it, but without having the index of the element of the array, I couldn't figure out how to do it... efficiently.
#3. In my purge() function, I do a simple $self->{list}=[] to reset the list. I presume Perl's memory management is good enough to handle that style of deallocation. Correct?
There are a bazillion methods of doing this, and I've tried a few, but I'm curious as to everyone else's ideas.
Thanx! and forgive me if this is not well written, I'm just on my first cup of java.
Thanx!
Re: Efficiency Question
by fundflow (Chaplain) on Feb 23, 2001 at 19:18 UTC
|
| [reply] |
Re: Efficiency Question
by clintp (Curate) on Feb 23, 2001 at 19:33 UTC
|
#2. In the check() function, if a hit is found, I would like the item move to the top of the list (representing it should stay in the cache longer). What I need to do is remove the current item ($_ as it stands now) and tack it on to the end of the list via a push. How would you recommend doing it? I dont care about cleverness, or obsfucation, but raw number of instructions. I had though of doing an undef $_; inside the loop, but that doesn't seem to actually remove the item from the array, just undef it out (the list continues to have the same # of elements). I suspect I need to splice it, but without having the index of the element of the array, I couldn't figure out how to do it... efficiently.
You hit the nail on the head. Setting aside efficiency (I have a whole rant prepared for that) trying to splice up an array while iterating over it with for(LIST) is tough business. The easiest thing to do is to attack it with a for(init; test; increment) loop:
sub remove5s {
my($aref)=@_;
for(my $i=0; $i<@$aref; $i++) {
$_=$$aref[$i];
unless ($_ % 5) {
splice(@$aref, $i, 1);
$i--;
}
}
}
This sub removes elements that are evenly divisible by 5. Rather than splice them out, it could just as easily have pushed them onto the end of the list (but you didn't say which list) or swapped it with the element at the end (watch for terminal conditions!). | [reply] [d/l] |
|
Without much research here, wouldn't an AVL tree do what you would like to do? It is optimized (I believe; I hope I'm remembering the correct algorithm) for this type of thing; most often used items are moved to the top, and it is self-balancing too (I believe).
| [reply] |
Re: Efficiency Question
by mirod (Canon) on Feb 23, 2001 at 19:25 UTC
|
My first reflex if you use a list with ids would
be to use a hash instead, the keys being the id
and the values the data. To purge an item just
do a delete $cache->{$id}.
If you want to add a time stamp or something so that
you can purge the cache every now and then you then
either need another hash (id => weight, time or whatever
criteria you will use) or store an array [criteria, data]
in each value.
| [reply] [d/l] |
|
$class->method();
any slower thanClass::method();
? And by "slower", I mean technically, not noticably :)
TIA!
| [reply] [d/l] [select] |
|
The general wisdom says that yes, method calls are slower than function calls. (In some cases DRAMATICALLY slower.) For a simple case though, lets find out:
use Benchmark;
package Foo;
sub bar {
1;
}
package main;
timethese(1000000,
{
method => sub { Foo->bar(); },
func => sub { Foo::bar(); },
}
);
Drumroll....
Benchmark: timing 1000000 iterations of func, method...
func: 1 wallclock secs ( 1.93 usr + 0.00 sys = 1.93 CPU) @ 518134.72/s (n=1000000)
method: 6 wallclock secs ( 4.95 usr + 0.00 sys = 4.95 CPU) @ 202020.20/s (n=1000000)
Yup! When in doubt, Benchmark.
| [reply] [d/l] |
|
Yes, 30 hash lookups is slower than walking an array with foreach.
OTOH one hash lookup is much faster than walking a good chunk of an array.
If you can arrange that straight lookups happen more often than having to walk the whole cache, then you should get a performance win. (See code in my other node.)
| [reply] |
|
You should also consider that method invocation on a blessed reference is much faster than method invocation on class name.
There is rather little performance overhead on object method calls. In addition, the two calls you suggest are not really an apples-apples comparison, since the class method passes the class name as a param, and the direct invocation does not (if factored in, the direct invocation becomes about 20% slower than shown below).
package Foo;
use strict;
use warnings;
use Benchmark qw(cmpthese);
sub new {
bless {}, shift;
}
sub meth {
1
}
my $obj = new Foo;
cmpthese (-3, {
class => sub { Foo->meth($obj) },
method => sub { $obj->meth() },
direct => sub { Foo::meth() },
});
### results
Rate class method direct
class 464126/s -- -50% -64%
method 923914/s 99% -- -28%
direct 1289819/s 178% 40% --
MeowChow
s aamecha.s a..a\u$&owag.print | [reply] [d/l] |
Re (tilly) 1: Efficiency Question
by tilly (Archbishop) on Feb 23, 2001 at 22:55 UTC
|
| [reply] |
|
That's a really interesting module; never seen it before.
The reason I'm doing this by hand is to reduce overhead, both in execution and loading. All the prefabbed modules that are designed to do something like this are simply too complex for my needs. Which means writing something that is very specific to the parent.
I'm just looking for the simpliest, most efficient solution to my question.
This is more of an infatuation that an actual need.
Let me put it this way: without using any modules, what is the fastest and smallest caching system you can come up with that stores and recalls blocks of data indexed by a Unique ID. The data is passed as a scalar. The cache must have a maximum size of 30 elements and be able to order itself according to hit frequency.
| [reply] |
|
Fastest and smallest are incompatible.
Also do you need at least 30, or exactly?
If I wanted to be bare bones I would do something like this
untested code:
use strict;
use vars qw($max_size $reset_size);
$max_size = 60;
$reset_size = 30;
{
my %count;
my %cache;
sub get_elem {
my $id = shift;
if (exists $count{$id}) {
++$count{$id};
return $cache{$id};
}
else{
return;
}
}
my $size;
sub set_elem {
{
my $id = shift;
my $elem = shift;
if (exists $count{$id}) {
$cache{$id} = $elem;
}
else {
if ($max_size < ++$size) {
_clear_cache();
}
$count{$id} = 0;
$cache{$id} = $elem;
}
redo if @_;
}
}
sub _clear_cache {
my @ids = sort {$count{$b} <=> $count{$a}} keys %count;
@ids = @ids[0..($reset_size-1)];
my %new;
%count = ();
foreach (@ids) {
$new{$_} = $cache{$_};
$count{$_} = 0;
}
%cache = %new;
$size = $reset_size;
}
}
Note that I go over 30 here, but doing so makes the actual lookup much faster. | [reply] [d/l] |
|
Re: Efficiency Question
by jeroenes (Priest) on Feb 23, 2001 at 23:14 UTC
|
Upon reading this, I have two thoughts. One, Maybe this is Yet Another
Application for the BerkeleyDB. I really feel
like I'm giving the same answer to every other Seeker, but Berkeley has a
BTree lookup, which is really efficient, and a configurable cache, and it
is written in C. Maybe the interface has too much overhead (I don't know),
and it also depends on the number of items you want to cache. I mean, for
30 items, it's hardly worth the DB overhead, probably. But maybe it's
already something different at 300 items.
Number two is: C. If you really want to save on the number of instructions,
you'd better implement the storage and fetching in C. You can do much more
educated guesses about the nature of your items than perl can. But it
really depends on your data. Variable length or fixed? Number or text?
And of course, interfacing with arrays is cumbersome and
instruction-intensive. However, if you keep your cache packed in
perl, and as a pointer in C, you will circumvene that.
If you can give some background on the nature of your data, I can have a
try at some C code. Would be fun, methinks.
Hope this helps,
Jeroen "We are not alone"(FZ) | [reply] |
|
|