Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number

A benchmark

by Jeffrey Kegler (Hermit)
on Oct 07, 2007 at 00:50 UTC ( #643192=note: print w/replies, xml ) Need Help??

in reply to Re: Unique numeric ID for reference?
in thread Unique numeric ID for reference?

Actually, to my surprise, simply sticking an ID number in the refered-to object (it's an array) and dereferencing it is fastest. Here are the numbers:
Benchmark: running ID Field, Numeric, Refaddr, String for at least 3 C +PU seconds... ID Field: 4 wallclock secs ( 3.43 usr + 0.03 sys = 3.46 CPU) @ 10 +44124.57/s (n=3612671) Numeric: 4 wallclock secs ( 3.34 usr + 0.04 sys = 3.38 CPU) @ 98 +9664.50/s (n=3345066) Refaddr: 5 wallclock secs ( 3.08 usr + 0.03 sys = 3.11 CPU) @ 91 +2330.23/s (n=2837347) String: 4 wallclock secs ( 3.09 usr + 0.04 sys = 3.13 CPU) @ 74 +7332.27/s (n=2339150)
Rate String Refaddr Numeric ID Field String 747332/s -- -18% -24% -28% Refaddr 912330/s 22% -- -8% -13% Numeric 989664/s 32% 8% -- -5% ID Field 1044125/s 40% 14% 6% --
And here's the code that did the Benchmark:
#!perl -w use strict; use Benchmark qw(:all); use Scalar::Util qw(refaddr); my $r = [0]; sub string { pack("A", "$r"); } sub numeric { pack("J", $r+0); } sub addr { pack("J", refaddr $r); } sub field { pack("J", $r->[0]); } my $result = timethese(-3, { String => \&string, Numeric => \&numeric, Refaddr => \&addr, "ID Field" => \&field, }); cmpthese($result);

Replies are listed 'Best First'.
Re: A benchmark
by Juerd (Abbot) on Oct 07, 2007 at 14:05 UTC

    0 doesn't really count as a unique id, does it?

      Since my test data has exactly one sort record, any ID number would be unique. :-) I'm trying to focus on the per-record time for a pre-pass to a sort, so I think a single record database captures those aspects of the problem I'm focused on.

      I ran more numbers, by the way, looking at what happens if you have to deal with potentially undefined records. It gets complicated depending on whether you can turn off warnings, if you have to explicitly test for undefinedness, whether you need multiple levels of, etc., etc. The numeric solution ($ref+0) and the indirection-to-unique-identifier solution ($ref->[0]) run neck to neck, continually swapping first and second place with every small change in the assumptions.

      My conclusion is that they're close enough in terms of efficiency that even in time-efficiency driven situations, you can let other factors (readability, space-efficiency, etc.) decide.

      I've coded it up using your suggestion of forcing the reference to numeric ($ref+0). Like I say, I decided efficiency was a tie, and by using the references as the subkesy I save the extra logic needed to create and track an extra data field.

      I do wonder why in the refaddr code in non-XS Scalar::Util, the code stringifies the reference then pulls a number out with a regex. As far as I can tell in terms of complexity and time-efficiency, that's clearly inferior to forcing the reference to numeric.


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://643192]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (5)
As of 2018-01-18 08:36 GMT
Find Nodes?
    Voting Booth?
    How did you see in the new year?

    Results (208 votes). Check out past polls.