Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

Re (tilly) 3: Efficiency Question

by tilly (Archbishop)
on Feb 23, 2001 at 23:58 UTC ( [id://60541]=note: print w/replies, xml ) Need Help??

in reply to Re: Re (tilly) 1: Efficiency Question
in thread Efficiency Question

Fastest and smallest are incompatible.

Also do you need at least 30, or exactly?

If I wanted to be bare bones I would do something like this untested code:

use strict; use vars qw($max_size $reset_size); $max_size = 60; $reset_size = 30; { my %count; my %cache; sub get_elem { my $id = shift; if (exists $count{$id}) { ++$count{$id}; return $cache{$id}; } else{ return; } } my $size; sub set_elem { { my $id = shift; my $elem = shift; if (exists $count{$id}) { $cache{$id} = $elem; } else { if ($max_size < ++$size) { _clear_cache(); } $count{$id} = 0; $cache{$id} = $elem; } redo if @_; } } sub _clear_cache { my @ids = sort {$count{$b} <=> $count{$a}} keys %count; @ids = @ids[0..($reset_size-1)]; my %new; %count = (); foreach (@ids) { $new{$_} = $cache{$_}; $count{$_} = 0; } %cache = %new; $size = $reset_size; } }
Note that I go over 30 here, but doing so makes the actual lookup much faster.

Replies are listed 'Best First'.
Re: Re (tilly) 3: Efficiency Question
by MeowChow (Vicar) on Feb 24, 2001 at 01:56 UTC
    You can speed up the _clear_cache, which would in probably eat up the most cycles, being called once every 30 insertions, like so:
    sub clear_cache { my @ids = sort {$count{$b} <=> $count{$a}} keys %count; splice @ids, 0, $reset_size-1; delete @cache{@ids}; %count = (); @count{keys %cache} = (0) x $reset_size; $size = $reset_size; }
    Faster to delete elements from an existing hash, I've found, than to rebuild a hash. This of course depends mostly on the ratio of deleted/total hash entries.

    Also, this is isn't quite a frequency-expiration cache, since time is not a factor into the expiration routine, and you would stand a pretty good chance of getting stale cache entries that once had a bunch of hits, but remain in the cache even though they are entirely unused. I'd probably switch the count over to something like a shift-based MRU, called once every 30 fetches:

    my $do_shift = 0; sub get_elem { my $id = shift; if (++$do_shift == 30) { shift_mru(); $do_shift = 0; } if (exists $count{$id}) { $count{$id} |= 0x80000000; return $cache{$id}; } else{ return; } } sub shift_mru { while (my ($id, $count) = each %count) { $count{$id} >>= 1; } }
    I also seem to recall that there are ways to optimize the sorting algorithm for a return of fewer results than the total number to be sorted, but that's getting rather hairy :-)
                   s aamecha.s a..a\u$&owag.print

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://60541]
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (5)
As of 2024-07-17 20:39 GMT
Find Nodes?
    Voting Booth?

    No recent polls found

    erzuuli‥ 🛈The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.