Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

The arbitrary operator maker

by jimt (Chaplain)
on Nov 22, 2006 at 19:14 UTC ( [id://585598]=perlmeditation: print w/replies, xml ) Need Help??

Over in Ultra compact set searching, I had a little function called "and_maker" which inspired two comments, so I figured I'd generalize it and give it its own node.

The general problem is: I have an operation I would like to perform on each element in a list.

It's very easy to state. It's very easy to think about. "I want to add up these 5 items." "I want to multiply these 7 things." "I want to xor these 83 elements." Like so many things, the devil's in the details. While you can hardwire your function to add up exactly 5 things, you may need to add up 6 next time. And maybe 4 after that.

The standard approach is to use an iterator. I'll show you a different way. Not necessarily better, but different.

A standard way to do this might be something like the following. Say you want to add up a list of elements:

sub multi_add { my $sum = 0; $sum += $_ foreach @_; return $sum; } my $result = multi_add(3,4,5);

What we're gonna do instead is dynamically generate a subroutine that only operates on the number of elements you have for this call. We'll be slick and cache the function so that subsequent calls give us back the same code snippet. Call the arbitrary_operator_maker with two arguments - the operator and the number of elements.

{ my %dispatch = (); sub arbitrary_operator_maker { my $operation = shift; my $num_elements = shift; my $dispatch_key = "$operation$num_elements"; return $dispatch{$dispatch_key} if $dispatch{$dispatch_key}; my $built = join $operation, map {'$_[' . $_ . ']'} (0..$num_eleme +nts - 1); return $dispatch{$dispatch_key} = eval "sub { $built }"; } }

You then create a new coderef for what you want to do:

my $add_3_things = arbitrary_operator_maker('+', 3); my $result = $add_3_things->(3,4,5); print "$result\n"; my $other_result = $add_3_things->(77,88,99); print "$other_result\n"; #etc.

The function generated is straightforward. If you looked at what it actually does (which you can see by dumping out the $built variable inside the arbitrary_operator_maker, it builds up your list of elements once for you, for repeated use.

my $add_3_things = arbitrary_operator_maker('+', 3); $add_3_things internally looks like: sub { $_[0] + $_[1] + $_[2] } my $xor_5_things = arbitrary_operator_maker('^', 5); $xor_5_things internally looks like: sub { $_[0] ^ $_[1] ^ $_[2] ^ $_[3] ^ $_[4] }

Okay, so what? Well, this newly created function doesn't need to iterate, it just performs the operation as many times as you need. Performance can be impressive, as I'd written up over here. But, in other testing I'd done, its scaling is twitchy.

There also is overhead with doing the lookup and dispatching a pre-created function. If you can cache yourself in advance, the benefits are impressive. But my caching routine may just suck.

my $op = '+'; my $and_maker_5 = arbitrary_operator_maker($op, 5); my @list_5 = (1..5); timethese(1000000, { 'and_maker 5 elements cached' => sub { $and_maker_5->(@list_5); }, 'and_maker 5 elements dispatched' => sub { my $and_maker_5 = arbitrary_operator_maker($op, 5); $and_maker_5->(@list_5); }, 'multi_add 5 elements' => sub { multi_add(@list_5); }, }); Benchmark: timing 1000000 iterations of and_maker 5 elements cached, a +nd_maker 5 elements dispatched, multi_add 5 elements... and_maker 5 elements cached: 2 wallclock secs ( 0.98 usr + 0.00 sys += 0.98 CPU) @ 1020408.16/s (n=1000000) and_maker 5 elements dispatched: 4 wallclock secs ( 2.95 usr + 0.01 +sys = 2.96 CPU) @ 337837.84/s (n=1000000) multi_add 5 elements: 3 wallclock secs ( 2.53 usr + 0.01 sys = 2.54 + CPU) @ 393700.79/s (n=1000000)

Basically, a technique like this would be useful if you're going to do the same operation many many times over the same number of arguments. If not, the overhead of building the function or even looking up a pre-cached function may cause performance to fall below that of an iterator. This version of the operator maker also does no bounds checking. So you can call $add_5_things with 3 elements or 97 of them. That may be a bad thing. YMMV.

Replies are listed 'Best First'.
Re: The arbitrary operator maker
by samtregar (Abbot) on Nov 22, 2006 at 20:30 UTC
    Have you seen List::Util's reduce routine? It's supposed to be pretty fast.

    -sam

      I have and it's not. Or, at least, it's not faster. But it's also more general purpose.

      my $op = '+'; my $add_5_things = arbitrary_operator_maker($op, 5); my @list_5 = (1..5); timethese(1000000, { 'add_5_things elements cached' => sub { $add_5_things->(@list_5); }, 'add_5_things elements dispatched' => sub { my $add_5_things = arbitrary_operator_maker($op, 5); $add_5_things->(@list_5); }, 'multi_add 5 elements' => sub { multi_add(@list_5); }, 'reduce' => sub { reduce { $a + $b} @list_5; } }); Benchmark: timing 1000000 iterations of add_5_things elements cached, +add_5_things elements dispatched, multi_add 5 elements, reduce... add_5_things elements cached: 0 wallclock secs ( 0.99 usr + 0.00 sys + = 0.99 CPU) @ 1010101.01/s (n=1000000) add_5_things elements dispatched: 2 wallclock secs ( 2.94 usr + -0.00 + sys = 2.94 CPU) @ 340136.05/s (n=1000000) multi_add 5 elements: 2 wallclock secs ( 2.52 usr + 0.01 sys = 2.53 + CPU) @ 395256.92/s (n=1000000) reduce: 4 wallclock secs ( 3.24 usr + 0.01 sys = 3.25 CPU) @ 30 +7692.31/s (n=1000000)

      Or, with a bigger set:

      my $op = '+'; my $add_500_things = arbitrary_operator_maker($op, 500); my @list_500 = (1..500); timethese(10000, { 'add_500_things elements cached' => sub { $add_500_things->(@list_500); }, 'add_500_things elements dispatched' => sub { my $add_500_things = arbitrary_operator_maker($op, 500); $add_500_things->(@list_500); }, 'multi_add 5 elements' => sub { multi_add(@list_500); }, 'reduce' => sub { reduce { $a + $b} @list_500; } }); Benchmark: timing 10000 iterations of add_500_things elements cached, +add_500_things elements dispatched, multi_add 5 elements, reduce... add_500_things elements cached: 1 wallclock secs ( 0.72 usr + 0.00 s +ys = 0.72 CPU) @ 13888.89/s (n=10000) add_500_things elements dispatched: 0 wallclock secs ( 0.74 usr + 0. +00 sys = 0.74 CPU) @ 13513.51/s (n=10000) multi_add 5 elements: 0 wallclock secs ( 1.03 usr + 0.00 sys = 1.03 + CPU) @ 9708.74/s (n=10000) reduce: 1 wallclock secs ( 1.18 usr + 0.00 sys = 1.18 CPU) @ 84 +74.58/s (n=10000)

      Of course, it's also much more general purpose than the arbitrary_operator_maker is, which is highly specialized. But for these tests, though, reduce was the slowest of the bunch. Benchmarks are your friend.

        In your final comparison between reduce and other operations you note that being specialized allows you to be faster. Sure. Ok. But you didn't buy much with that.

        ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

Re: The arbitrary operator maker
by ikegami (Patriarch) on Nov 22, 2006 at 21:16 UTC
    You mention performing operations 83, 97 and even 4000 elements. I'd just like to point out that some floating points operations are more accurate when performed on numbers of similar magnitude, so using an accumulator (such as all the presented methods use) is not ideal in those circumstances.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://585598]
Approved by Corion
Front-paged by tye
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (4)
As of 2024-03-19 10:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found