Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re^2: Threads and fork and CLONE, oh my!

by xdg (Monsignor)
on Aug 12, 2005 at 11:09 UTC ( [id://483244]=note: print w/replies, xml ) Need Help??


in reply to Re: Threads and fork and CLONE, oh my!
in thread Threads and fork and CLONE, oh my!

The benefits of inside-out/flyweight objects have been beat to death on this board and center primarily on the stronger encapsulation of data and the orthogonality to potential property name clashes with super/subclasses. However, the approach needs a unique ID -- and, for reasons lost to history, someone used "$self" (hey, it's unique, right?) as cheaper than generating a unique ID and from there to just the memory address part, and the cargo cult followed.

I think the fundamental storage techinque is sound and using a UUID would fix up the refaddr problem -- though as I said, at the cost of coupling superclasses/subclasses more tightly. It's fine if everything in the class hierarchy is built the same way (e.g. on a blessed scalar with a UUID inside), but one loses the ability to subclass someone else's class (e.g. on CPAN) without caring what kind of blessed reference they used (hash, array, etc.) or whether it changes in some future version. For some, that may be a bigger benefit.

I'd still like to hear people's view on that topic -- whether that is important enough to justify the extra complexity of CLONE . I'd also like to get people's views on whether adding external dependencies on Data::UUID and/or Win32API::GUID are worthwhile or whether some other inline, pure-Perl"unique id" algorithm is preferable (with say, Time::HiRes, process ID, hostname/IP, etc.

-xdg

Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

Replies are listed 'Best First'.
Re^3: Threads and fork and CLONE, oh my!
by adrianh (Chancellor) on Aug 17, 2005 at 15:08 UTC
    for reasons lost to history, someone used "$self" (hey, it's unique, right?) as cheaper than generating a unique ID

    Unfortunately it turns out that it isn't very cheap at all (speed wise). In fact it's just about the worse possible choice :-) On my perl 5.8.7 this basic benchmark:

    Gives me:

    BlessedHash x 10000 = 2265688 bytes ClassStd x 10000 = 2222948 bytes NumSelfAsIndex x 10000 = 2219534 bytes RefaddrCached x 10000 = 2300888 bytes RefaddrCall x 10000 = 2226816 bytes SelfAsIndex x 10000 = 2436816 bytes Rate SelfAsIndex ClassStd NumSelfAsIndex RefaddrCall +RefaddrCached BlessedHash SelfAsIndex 1000/s -- -9% -12% -44% + -57% -59% ClassStd 1100/s 10% -- -3% -38% + -53% -55% NumSelfAsIndex 1131/s 13% 3% -- -36% + -52% -54% RefaddrCall 1778/s 78% 62% 57% -- + -24% -27% RefaddrCached 2349/s 135% 114% 108% 32% + -- -4% BlessedHash 2443/s 144% 122% 116% 37% + 4% --

    with a plain $self index coming in a lot worse than the faster alternatives.

      The performance of the blessed hash case is dependent on the length of the keys used in the hash: The longer the key, the more time it takes!

      For one-character keys, blessed hashes are slightly faster than the cached refaddr case. (I got 2% when I did the timings.) However, one-character keys are rather unrealistic, and definitely not good programming practice.

      For two-character keys, the performance is the same.

      For three or more characters, cached refaddr is faster! I think five characters is realistic, and their performance is 2% slower. For ten characters, 7% slower!

      So if I were to call a winner, cached refaddr would be it.

      On another minor note, 0+$self yields the same result as the refaddr function. So you can eliminate 'use Scalar::Util', and just cache 0+$self.


      Remember: There's always one more bug.
        The performance of the blessed hash case is dependent on the length of the keys used in the hash: The longer the key, the more time it takes!

        You're right, but this isn't the reason that using $self is so much slower. Stringification of references is just slow:

        #! /usr/bin/perl use strict; use warnings; use Benchmark qw( cmpthese ); my $self = bless {}, 'SomeClass'; my $string = "$self"; my %a = ( $self => 0 ); my %b = ( $string => 0 ); cmpthese(-1, { self => sub { $a{ $self } = $a{ $self } + 1 }, string => sub { $b{ $string } = $b{ $string } + 1 }, }); __END__ # on my perl 5.8.7 Rate self string self 156393/s -- -83% string 927942/s 493% --
        On another minor note, 0+$self yields the same result as the refaddr function.

        Unless you overload arithmetic.

        Cool. Hadn't realized that was the case about hash keys.

        I'm a little surprised at the refaddr versus 0+$self conclusion, though -- I would have thought that refaddr is just XS that returns a memory address, whereas 0+$self would wind up casting things to Perl scalars with associated overhead. I guess it's optimized away. Good to know.

        -xdg

        Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

        On another minor note, 0+$self yields the same result as the refaddr function. So you can eliminate 'use Scalar::Util', and just cache 0+$self.

        Only when nummification is NOT overloaded. And in earlier perls you can't unoverload nummification.

        ---
        $world=~s/war/peace/g

      I sort of meant that tongue-in-cheek... but the benchmarks are neat to see. Actually I mean 'cheap' to use $self as opposed to some other "guaranteed" unique ID like a UUID (e.g. Data::UUID). I suspect that any of the $self as index variations will be faster than a UUID-as-index variation.

      I've also pondered a lighter-weight, pure-perl alternative like packing Time::HiRes::gettimeofday() and the memory address of an anoymous lexical during construction, as a memory address at a point in time should be unique on a single machine and "global" uniqueness isn't so much an issue for this kind of object ID.

      On the other hand, I'm personally moving away from the unique ID answer as CLONE works for those few who dare to muck with threads, and I don't think the performance hit of sharing objects across threads for those few people who might want it is going to be worth giving up the promiscuous nature of inside-out objects as a general property.

      (And for the next headache/magic-trick I'm considering with Object::LocalVars: trying out lexical closures to anonymous globrefs instead of package globals for storage to give local aliasing and encapsulation. And then see if I can get it running without too much of a performance hit against other options. Sign me up for The Perl Crackpot Index, I guess.)

      -xdg

      Code written by xdg and posted on PerlMonks is public domain. It is provided as is with no warranties, express or implied, of any kind. Posted code may not have been tested. Use of posted code is at your own risk.

Re^3: Threads and fork and CLONE, oh my!
by dragonchild (Archbishop) on Oct 03, 2005 at 02:42 UTC
    It's fine if everything in the class hierarchy is built the same way (e.g. on a blessed scalar with a UUID inside), but one loses the ability to subclass someone else's class (e.g. on CPAN) without caring what kind of blessed reference they used (hash, array, etc.) or whether it changes in some future version. For some, that may be a bigger benefit.

    That's going to be a problem no matter how you represent your object in memory. The only way around that is if you give over generating new attributes (and accessors for said attributes) to some other entity that will then do it in the same manner for all classes in the hierarchy.

    This fact, btw, is the biggest win for P6 OO. The method of implementation is less important than the fact of implementation. There is now some arbiter of attribute/accessor generation that will do it the same way every time. It will also resolve clashes in some sane and user-definable manner. Beyond that, it's all gravy.


    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://483244]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (9)
As of 2024-03-28 09:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found