Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

Re^3: Performance, Abstraction and HOP

by Anonymous Monk
on Sep 01, 2005 at 17:38 UTC ( #488437=note: print w/replies, xml ) Need Help??

in reply to Re^2: Performance, Abstraction and HOP
in thread Performance, Abstraction and HOP

A counter example is trees & tries. These are immensely useful structures for many purposes, and there are quite a few flavours of both on CPAN. But, for the most part, they are almost useless for anything but experimentation and the most trivial of applications. They are, mostly, based upon using hashes to construct the trees, with the result that the are slow, clumsy and hugely memory hungry.
I'm guessing that you mean that tries are implemented with hashes. I can't image trees implemented with arrays being that much more memory hungary than arrays alone. Sure, operations on trees will be slower than similar operations on arrays, but that's mostly comparing perl vs. C.
#!/usr/bin/perl -w use strict; use Data::Dumper; my $MAX = 10000; my $tree = make_tree($MAX/2,undef,undef); for(my $x=1;$x<$MAX;$x++) { $tree = insert(int(rand($MAX)), $tree); } print "sum = ", sum_tree($tree), "\n"; #print Dumper $tree; sub sum_tree { my $tree = shift; return 0 if not defined($tree); return node($tree) + sum_tree(right($tree)) + sum_tree(left($tree) +); } sub insert { (my $elem, my $tree) = @_; if(not defined($tree)){ return make_tree($elem, undef, undef); } my $curr = node($tree); if( $elem == $curr) { return $tree; } elsif($elem < $curr) { return make_tree($curr, insert($elem,left($tree)), right($tree)); } elsif($elem > $curr) { return make_tree($curr, left($tree), insert($elem,right($tree))); } } sub make_tree { [$_[0], $_[1], $_[2]] } sub node { $_[0]->[0] } sub left { $_[0]->[1] } sub right { $_[0]->[2] }

Replies are listed 'Best First'.
Re^4: Performance, Abstraction and HOP
by BrowserUk (Pope) on Sep 01, 2005 at 18:31 UTC
    I can't image trees implemented with arrays being that much more memory hungary than arrays alone.

    Don't imagine--measure :)

    P:\test>junk Array[ 1..10000]: 200056 Sum @array = 50005000 Tree[ 1..10000 ]: 1120016 Sum tree = 50005000 Rate tree array tree 17.1/s -- -82% array 95.1/s 455% --

    I think that 6x bigger and 5x slower pretty much makes my point.

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
      I think that 6x bigger and 5x slower pretty much makes my point.
      Holy crap, now I'm curious. Only 5x slower is faster than what I would have guessed. But 6x larger is crazy. Anyone know how much boxing Perl does? I would have thought that the rough estimate for the size of a scalar number would be say 16 bytes (4 bytes for a pointer, 4 bytes for type/reference-count information, 8 bytes for a double precision floating point number). And the overhead for an array at maybe 16 bytes (8 bytes for a length field, 4 bytes for type/reference-count info, 4 bytes for a pointer to the array of pointers). For the tree structure above (an array composed of one scalar and two arrays) that would be 16 (array overhead)+3*4(three elements in first array, 4 byte pointers (32 bit machine))+16(the scalar)+2*16(the left and right branches) = 76 bytes. I guess that's starting to add up, but it is still shy of the 112 bytes measured above. Perl must preallocate space for each array to make growing it faster (maybe 12 elements initially?). Does that sound about right? Any way to get a more slimed down data structure in pure Perl?
        Is 6x really that crazy? Doesn't a typical generic C tree use about 4x? (If you're storing integers in the tree. If you are storing something larger, the overhead is smaller.) The perl array includes at least a reference count and a length that aren't in the C Node struct, so 6x quickly becomes plausible.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://488437]
[Corion]: Discipulus: Well, in many cases it doesn't make sense to build an interface and complicated program just to enter 20 rows into a database ;) But yes, automating data imports should pay off in the long run
[LanX]: Choroba: this happened before I joined, was still in uni, but my boss was summoned to the CEO of the second biggest German bank at that time and could only say " I told them its not ready" ;)
[LanX]: memories....I missed my connection while chatting
[Discipulus]: in this case Corion we are speaking about software licensing: evry year or two we must rescan the whole ced to produce an excel report, while at every activation / disactivation we update a black box DB: i said that i a week i can produce the perl to..
[Discipulus]: rend out the xls IF i have access to the DB
[choroba]: LanX I miss working in a bank sometimes...
[Corion]: Discipulus: Ooof. Especially yearly things are things I like to automate instead of trying to remember how I did things last year...
[Corion]: And the second rule that I've learned is, that there is no one-off job, so writing a program for it pays off almost immediately. The third rule is to give all my programs numbers and have them reproduce that number in the name of their output files. :)
[Discipulus]: the true part is that also specification change between years.. but well our job is cheap but dont abuse of us.. ;=)

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (12)
As of 2017-03-29 12:04 GMT
Find Nodes?
    Voting Booth?
    Should Pluto Get Its Planethood Back?

    Results (350 votes). Check out past polls.