Re: “A meeting at the Liquor-Vodka Factory”, or… same ARRAY questions again?!!

Gentlemen,

1st of all - huge thank you to everybody. It’s as always a pleasure to be here and have your questions answered. I *love* this community of wise, intelligent and professional hackers:-), (-- although I am not too much into Perl:-)- (sorry))..

A few general remarks:
1. I am sorry if my initial post caused a bit of frustration for some of you. It wasn’t intended to be. My point was:
I expect a higher quality of answers in FAQ than in general posts. And the more F is a Q, the higher quality the A (I assume) should be. That’s all.
It’s just my opinion of course, but I think many people treat “FAQ” as the only source of truth...

2. Thanks a lot for the two _binary search_ answers. Both seem to work, I’ll do some corner-case and performance testing and compare them to hash implementation as well, -- and probably post the results here (if nobody minds).
2-a). I’ve made two changes in find_int_in_array :
A)

    
 #my ( $arref, $targ ) = @_; # args are array ref, int value
  my ( $targ, $arref ) = @_; # args are: (int, array ref)
[download]

this is just a “style” thing, easier to remember (int in arr, not arr in int), plus - easier to test together with other impl-s by the same driver;
B)
Changed

    my $nextidx = $asize / 2;
    my $nextinc = $nextidx / 2;
[download]

to:

    my $nextidx = int($asize / 2);
    my $nextinc = int($nextidx / 2);
[download]

(Otherwise, for an array of 5 elements, it’d return an index = 2.5 ;-...)).

3. Re: using <code> v.s <pre>:
I do know the rules and tried not to, but,,, on my screen, it’s either <code> is too small (cannot read), or with the font size increase, everything else’s too big and bold..... Maybe smb. can review this policy? (Minor thing of course).

4. ($#array + 1) vs. scalar(@array) :
You’re going to laugh, but I did a performance test..:). The results is: $#array is ~10-15% faster than the other one. Intuitively, that’s what one would expect (my wild guess is that $#array is probably an internal counter that is always kept up-to-date, and the scalar(@array) gets calculated).
....This is a pure academical question of course:-)... It’s hard to imagine an application that would suffer from using the 2nd approach.

5. Using a hash vs. Binary search.
Again, hash works just fine, except for three implications:

memory;
slows down for huge amounts of data (vs. bisearch which is always linear);
complex:-)

Enough said about ## 1 & 2; let me explain the 3d one:
... Imagine an application that updates an array rarely and looks up often. Say you have 10,000,000 customers / books / whatever that get added / published only once a minute, but information is requested 1000 per second.
So, re-building the hash for lookup 1000 times a second is clearly not an option. Right?
A solution to that might be (I’ll think in some pseudo-language now):

class Array_with_Hash:

   @the_array = ()
   % the_hash = ();
   init(@a)
 {
    put (a =>> the_array);
   recreate_hash (the_array, the_hash);
 }

  pop(elem):
 {
     array.pop(el)
     update_hash(el);
 }

...
[download]

Well, if you know for sure that push and pop is *all* that you do, that might be possible. However, creating a generic class like this might be really tricky.

array must be locked (no access to it directly, -- only by this class’s methods;
hash values probably have to be arrays_of_indexes...
implementing sort(), slice-and-dice etc. might take some time;
it’s quite easy to “forget” about something...
...

This only means that the hash approach only works if

cost(recreating the hash every time) == nothing
OR
cost(programming time) == nothing..:)

In other words, -- thanks again for the binary search in Perl! (And it should be in FAQ, too...:-)...

Comment on Re: “A meeting at the Liquor-Vodka Factory”, or… same ARRAY questions again?!! Select or Download Code

Replies are listed 'Best First'.
Re^2: “A meeting at the Liquor-Vodka Factory”, or… same ARRAY questions again?!! by graff (Chancellor) on Sep 06, 2006 at 06:46 UTC
On another topic... Imagine an application that updates an array rarely and looks up often. Say you have 10,000,000 customers / books / whatever that get added / published only once a minute, but information is requested 1000 per second. So, re-building the hash for lookup 1000 times a second is clearly not an option. Right? Um, if the application is as you describe, why would you want to use an array at all? With that sort of ratio between updates and searches, it would be better just to maintain a hash instead of an array. Surely you do not want to "recreate the hash every time" in order to implement the search; create the hash once and maintain it, assuming it fits in memory -- and if not, consider a DBM_File approach (cf. AnyDBM_File), or just use a real database backend. If your issue is "using a (temporary) hash to store a copy of an array so that the array can be searched for specific values", yes, that's a bad approach for really big arrays and the kind of update/search ratio you're suggesting. But you're not explaining why the primary data storage needs to be an array. If the issue is really "coming up with a viable app to support lots of searches for specific values in a large set", the answer is more likely to be: start by using a hash as the primary storage, and use the values to be searched for as hash keys.	[reply]
Re^2: “A meeting at the Liquor-Vodka Factory”, or… same ARRAY questions again?!! by graff (Chancellor) on Sep 06, 2006 at 06:19 UTC
Changed `my $nextidx = $asize / 2; my $nextinc = $nextidx / 2;` [download] to: `my $nextidx = int($asize / 2); my $nextinc = int($nextidx / 2);` [download] (Otherwise, for an array of 5 elements, it’d return an index = 2.5 ;-...)). Actually, that change is unnecessary. Whenever a floating point value is used as an array index, Perl automatically uses just the integer part of the value, since that would be the only sensible thing to do.	[reply] [d/l] [select]


Perl-Sensitive Sunglasses
	PerlMonks