Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re^3: Which, if any, is faster?

by Smylers (Pilgrim)
on Jun 01, 2005 at 13:55 UTC ( [id://462446]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Which, if any, is faster?
in thread Which, if any, is faster?

... arrays are probably not that much faster than hashes ...

Indeed not. Instead of switching from arrays to hashes you can probably get a similar speed-up by getting some slightly faster hardware — which'll likely be cheaper than what it'd cost in extra development time to have to deal with inappropriate data structures.

... but hashes sure do take up a lot of extra memory with those verbose keys.. particularly if the verbose keys are containing redundant information.

That simply isn't true. Wanting to have several separate hashes with the same keys is a common situation. Indeed it's one the Perl 5 Porters considered, so they explicitly coded to optimize the memory usage: all hash keys are only stored in memory once no matter no many hashes they are used in -- see perl5004delta.

To summarise: use hashes where each key contains some unique information, and use arrays where key names would be duplicated under a hash arrangement.

To summarize: don't try to second-guess Perl, and don't assume that you are cleverer that the Perl 5 Porters.

Smylers

Replies are listed 'Best First'.
Re^4: Which, if any, is faster? (20% faster/ 40% smaller)
by BrowserUk (Patriarch) on Jun 01, 2005 at 19:52 UTC

    I read the following as array-based objects are 20% faster and 70% 40% smaller than the hash based equivalent for little extra effort and no loss of clarity.

    #! perl -slw use strict; package ArrayBased; use constant { CLASS => 0, SELF => 0, FIRST => 0, SECOND => 1, THIRD => 2, FOURTH => 3, FIFTH => 5, }; sub new { return bless [ qw[ one two three four five ] ], $_[CLASS]; } sub method { (undef) = $_[SELF][FIRST]; (undef) = $_[SELF][SECOND]; (undef) = $_[SELF][THIRD]; (undef) = $_[SELF][FOURTH]; (undef) = $_[SELF][FIFTH]; $_[SELF][FIRST] = 1; $_[SELF][SECOND]= 2; $_[SELF][THIRD] = 3; $_[SELF][FOURTH]= 4; $_[SELF][FIFTH] = 5; } package HashBased; sub new { my( $class ) = @_; return bless { FIRST => 'one', SECOND => 'two', THIRD => 'three', FOURTH => 'four', FIFTH => 'five', }, $class; } sub method { my( $self ) = @_; (undef) = $self->{FIRST}; (undef) = $self->{SECOND}; (undef) = $self->{THIRD}; (undef) = $self->{FOURTH}; (undef) = $self->{FIFTH}; $self->{FIRST} = 1; $self->{SECOND} = 2; $self->{THIRD} = 3; $self->{FOURTH} = 4; $self->{FIFTH} = 5; } package main; use Benchmark qw[ cmpthese ]; use Devel::Size qw[ size total_size ]; cmpthese -1, { arrayBased => q[ my $o = new ArrayBased; $o->method; ], hashBased => q[ my $o = new HashBased; $o->method; ], }; print "ArrayBased bytes: ", total_size( ArrayBased->new() ); print " HashBased bytes: ", total_size( HashBased->new() ); __END__ P:\test>462336 Rate hashBased arrayBased hashBased 48505/s -- -16% arrayBased 57853/s 19% -- ArrayBased bytes: 220 HashBased bytes: 373

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
      I read the following as array-based objects are 20% faster and 70% 40% smaller than the hash based equivalent for little extra effort and no loss of clarity.

      Depends on what you're doing. Even with this simple case I'd argue that having to maintain the array indices constants away from new() already make it slightly less clear.

      Once you have inheritance of array based objects it gets tricker since you have to keep the constants in sync between the parent and child classes. Remembering whether you've used array index 8 is a lot harder than remembering if you've used hash key something_the_parent_does.

      Once you start refactoring classes and moving object fields between classes, adding fields to classes, etc. I find array based objects becomes a complete PITA.

        I would not advocate this use for anything other than those applications that require the performance or memory gains or both.

        Example: Data::Trie & Tree::Trie. Both of these modules use hash-based instantiations. The problem is that they require so much room for each node in the trie that they are next to useless for any practical purpose.

        It's a simple case of using Perl to it's best advantage to achieve the desired goal. Blanket statements that all Perl OO modules should be hash-based completely ignores the possibilities of 3 or more other ways Perl's flexibility can be used to good affect to avoid having to drop into the nightmare of XS programming or the deep dependancies of Inline::C.

        Perl6 will provide the Perl programmer with a much greater degree of control over the storage representation of data elements and aggregates and it will (hopefully) impose far less penalties for using OO dispatching. Till then, if your application needs to deal with large volumes of concurrent, in-memory instances, or you want to extract the best performance from a given class, using the flexibility that Perl's OO mechanism provides to achieve your requirements is simple practical.

        Noone's advocating that every Perl class should be re-written to use an array-based representation to gain the 20%/40% benefits they can provide. If you don't need to optimise don't. If throwing hardware at the problem is an option; do so. It's just another option available to those that want or need it that can be safetly (and quietly) ignored by those that either don't need it or choose not to use for whatever reasons.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
        "Science is about questioning the status quo. Questioning authority".
        The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://462446]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (6)
As of 2024-04-23 11:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found