Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister

Optimizing a CHI-based data throttler

by perlancar (Friar)
on Feb 19, 2020 at 05:53 UTC ( #11113136=perlquestion: print w/replies, xml ) Need Help??

perlancar has asked for the wisdom of the Perl Monks concerning the following question:

I'm experimenting on using CHI as the backend of a Data::Throttler-like module. The speed is not great: my module is becoming linearly slower as the max_items parameter is increased: it's about 3 times slower than Data::Throttler with max_items=100, and 20 times slower with max_items=1000. Any idea on how to close the gap, or is my endeavor with CHI in this case a lost cause?

package Data::Throttler_CHI; use strict; use warnings; sub new { my ($package, %args) = @_; bless \%args, $package; } my $counter = 0; sub try_push { my $self = shift; my $now = time(); $counter++; $counter = 0 if $counter == 2e31; # wraparound 32bit int $self->{cache}->set("$now|$counter", 1, $self->{interval}); # Y228 +6! my @keys0 = $self->{cache}->get_keys; my @keys; for my $key (@keys0) { my ($key_time, $key_serial) = split /\|/, $key, 2; if ($key_time >= $now - $self->{interval}) { push @keys, $key; } } # these drivers return expired keys: Memory. so we need to purge t +hese keys my $do_purge = rand() < 0.05; # probabilistic $self->{cache}->purge if $do_purge && @keys < @keys0; return @keys <= $self->{max_items} ? 1:0; } 1;

More complete code with documentation and tests is on CPAN: Data::Throttler_CHI.

Replies are listed 'Best First'.
Re: Optimizing a CHI-based data throttler
by bliako (Parson) on Feb 19, 2020 at 08:59 UTC

    (I will not comment on how efficient the general idea is (with the specific key structure and counters etc.) as this is only part of the code. I will comment on just what I see here.) I understand that you loop over ALL keys to find the non-expired keys. Then you forget about the hard-earned array of NON expired keys @keys you have constructed and proceed to conditionally call purge() which probably has to do it again (internally, it has to find the set of expired keys this time I guess), in a similar or more efficient way.

    If you want to retain the data structure you are using, then you could break the loop as soon as you find even one expired key and forget about the array, just retain the fact that expired keys were found and purge() must be called (conditionally). Given that you purge if there is at least one expired key (on a 5% basis). Then you are left with finding if there is additional space (the return 1:0) but can't purge() return you the number of keys purged so that you could calculate if there is still space?

    A more drastic change would be to find all expired keys yourself. Given that you are doing it already now, or with my above suggestion you will probably have to partially do it, it will be a benefit if you believe you can do it in a more efficient way than purge. But then you must pass that set of expired keys you found to purge() so that it does not have to re-construct it. Does it take hints? And does it return some info?

    A different path would be to observe that purging will be done when there are expired keys ONLY 5% of the time (rand()<0.05). So, if there is another way to calculate whether there is space left (return @keys <= $self->{max_items} ? 1:0;) the expensive loop AND purge() could be avoided. You only have to find expired keys 5% of the time. But you need alternative way to find out if you are full.

    bw, bliako

      Thanks for the comments. I've just scrapped the first implementation, because recording every "push" in a separate cache key quickly becomes too slow as the number of "push"es/keys range in the thousands.
Re: Optimizing a CHI-based data throttler
by perlancar (Friar) on Feb 19, 2020 at 07:17 UTC

    Managed to increase speed by performing binary search on @keys. My module is now only about 40% slower when max_items=100 and 4x slower when max_items=1000.

    package Data::Throttler_CHI; use strict; use warnings; use List::BinarySearch::XS qw(binsearch_pos); sub new { my ($package, %args) = @_; bless \%args, $package; } my $counter = 0; sub try_push { my $self = shift; my $now = time(); $counter++; $counter = 0 if $counter == 2e31; # wraparound 32bit int $self->{cache}->set(sprintf("%010d %d", $now, $counter), 1, $self- +>{interval}); # Y2286! # we assume the driver returns keys in insert order, to avoid havi +ng to sort() #my @keys = sort $self->{cache}->get_keys; my @keys = $self->{cache}->get_keys; return 1 if @keys <= $self->{max_items}; my $time_expired = $now - $self->{interval}; my $pos_not_expired = binsearch_pos { no warnings 'numeric'; $a <= +> $b } $time_expired, @keys; $self->{cache}->purge if rand() < 0.01; (@keys - $pos_not_expired) <= $self->{max_items} ? 1:0; } 1;
Re: Optimizing a CHI-based data throttler
by perlancar (Friar) on Feb 19, 2020 at 12:13 UTC
    As noted in my other comment, I'm scrapping this first implementation because it quickly becomes too slow when the number of keys/hits reaches thousands. I'm now using the "buckets" approach like done in Data::Throttler. And my module is now consistently an order of magnitude slower than Data::Throttler for various values of max_items.
    package Data::Throttler_CHI; use strict; use warnings; use List::Util qw(sum); sub new { my ($class, %args) = @_; defined $args{max_items} or die "new: Please specify max_items"; $args{max_items} >= 1 or die "new: max_items must be at least 1 +"; defined $args{interval} or die "new: Please specify interval"; $args{interval} >= 1 or die "new: interval must be at least 1" +; defined $args{cache} or die "new: Please specify cache"; # calculate nof_buckets my $nof_buckets; if (defined $args{nof_buckets}) { $args{nof_buckets} >= 1 or die "new: nof_buckets must be at le +ast 1"; $nof_buckets = $args{nof_buckets}; } else { $nof_buckets = $args{interval} ** 0.5; } $nof_buckets = int($nof_buckets); my $self = { t0 => time(), max_items => $args{max_items}, interval => $args{interval}, cache => $args{cache}, nof_buckets => $nof_buckets, secs_per_bucket => $args{interval} / $nof_buckets, }; bless $self, $class; } sub try_push { my $self = shift; my $now = time(); my $secs_after_latest_interval = ($now - $self->{t0}) % $self->{in +terval}; my $bucket_num = int( $secs_after_latest_interval / $self->{interval} * $self->{nof_ +buckets} ) + 1; # 1 .. nof_buckets my $hits = $self->{cache}->get("hits.$bucket_num"); my $all_hits = $self->{cache}->get_multi_arrayref( [map {"hits.$_"} 1..$self->{nof_buckets}]); my $total_hits = sum(grep {defined} @$all_hits) || 0; return 0 if $total_hits >= $self->{max_items}; if ($hits) { $self->{cache}->set( "hits.$bucket_num", $hits+1, {expires_at=>$self->{cache}->get_expires_at("hits.$bucket_ +num")}); } else { $self->{cache}->set( "hits.$bucket_num", 1, {expires_at => $now + $self->{interval} - $secs_after_late +st_interval + ($bucket_num-1) * $self->{secs_per_bucket}}); } 1; } 1;

    Once again, the more complete version resides on CPAN.

      Now, performance obviously depends on CHI's search/get/set and I don't know what that is (can't see that from a diagonal look at the documentation).

      But something obvious is that Data::Throttler uses the notion of buckets, which are counters for a time range. And from what I can gather it uses an array to store those buckets. But you (CHI that is) use a hashtable (I guess?). So there is definetely a penalty there in terms of amortized performance but definetely it is not accounting for the order of magnitude you observe. But, still, the advantage of Data::Throttler is that it uses data structures and algorithms for that specific use-case (of summing buckets). It pays to be specific.

      So, it costs time to create the list of indices of items, fetch them and then sum them, as you do. Data::Throttler will only sum a continuous range (i.e. a loop) in its internal buckets array. That's a 3rd of your cost already. If CHI had a method to apply an operation (sum) over a list of indices or better still a continuous range of items, you could cut your costs a bit. But then that's not something one would expect from cache storage!

      The "really bad" news is that Data::Throttler can optimise its internal logic even more. And also add more features. Can you keep up with that by using a generic tool?

      Having said all that, I can think of one way to improve performance (but then Data::Throttler can also do that and be ahead, again). Instead of having each bucket storing a count of hits over its time-interval, make it keep an incremental count of hits. I.e. keep the sum of counts in each bucket of all its previous buckets (not the dead ones). In this way you do not need to do a sum each time try_push() is called. You will have that sum ready waiting for you in the last bucket. However, that sum will be invalid once a bucket is too old and had to be removed. Whenever a bucket gets too old, you have to subtract its count from the total count. So, create a function total_count() which either re-calculates the count if buckets were invalidated since last time it was called, or returns a cached count. Re-calculating is really easy: subtract the difference of the old bucket's count to its newer neighbour from all buckets. Then, if a hit is added, you only need to add it to the newest bucket. The total count will be held by the newest bucket and will need to be re-calculated if you ask for it after time-interval seconds since last time you called it. So, here is finally some use of cache :)

      bw, bliako

Re: Optimizing a CHI-based data throttler
by bliako (Parson) on Feb 20, 2020 at 13:31 UTC

    Having done all the rubber-ducking, an epiphany: you do not need to re-calculate the sum of bucket counts every time you try_push(). Instead cache this value and re-calculate it only if a bucket died since last try_push()

    bw, bliako

      Thanks, I'll remember this as one strategy to try. But first I'll do some profiling first.

        Preliminary profiling shows that indeed CHI is the bottleneck. 95% of the time is spent outside of Data::Throttler_CHI itself, while CHI::* code itself occupies at least ~73%; the rest of the overhead involves serialization, logging, and so on. Seems like reducing the number of cache item retrievals from CHI would be the primary strategy for speed-up. Or in other words, we need to cache CHI itself, thereby defeating its purpose in the first place :-)

        On the other hand, I manage to speed up Data::Throttler itself significantly simply by replacing Log::Log4perl with Log::ger. The Data::Throttler code logs a lot, so by removing logging statement (e.g. using Log::ger::Plugin::OptAway) we can cut the time into about one third!

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://11113136]
Approved by haukex
Front-paged by Corion
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (5)
As of 2020-04-01 05:44 GMT
Find Nodes?
    Voting Booth?
    To "Disagree to disagree" means to:

    Results (186 votes). Check out past polls.