### Re^4: Modified Binary Search

by Limbic~Region (Chancellor)
 on Jan 14, 2010 at 23:32 UTC ( #817503=note: print w/replies, xml ) Need Help??

in reply to Re^3: Modified Binary Search
in thread Modified Binary Search

BrowserUk,
That's the way I had it coded originally but when it wasn't complete when I came back from lunch - I wasn't happy to continue waiting. Performing a linear search through 5 million records 600,000 times still takes longer than I wanted.

Cheers - L~R

Replies are listed 'Best First'.
Re^5: Modified Binary Search
by BrowserUk (Pope) on Jan 15, 2010 at 08:01 UTC

Hm. You have a big list and a small list of dates, both sorted (or readily sortable).

And for each date in the small list, you want to find the subset of the big list that are 1 year either side.

And you search anew for each item in the small list?

Here's a one-pass algorithm:

```my @bigSorted = ...;
my @smallSorted = ...;
my %ranges;
my( \$lo, \$hi ) = ( 0, 0 );
for ( @smallSorted ) {
my( \$early, \$late ) = calcBoundaryDates( \$_ );
++\$lo while \$bigList[ \$lo ] lt \$early;
\$hi //= \$lo;
++\$hi while \$bigList[ \$hi ] le \$late;
\$ranges{ \$_ } = \$hi - \$lo;
}

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

That works fine if you're doing it as a one-time deal for all elements of Tbl_B. LR's problem description seems to say that the search needs to be done for the modified elements each time Tbl_B is modified. Since the above technique does linear search, it's not going to be optimal.

LR's problem description seems to say ...

Hm. That's not how I read it, nor do I see how you could possibly derive that interpretation from the OP?

Besides Limbic~Region 'imself seemed to think it was okay and he is not exactly known for accepting incomplete solutions.

Since the above technique does linear search, it's not going to be optimal.

Perhaps what he saw that you missed, is that whilst it is a linear search, it is a linear search once over the 5 million. Ie. O(5E6).

Whereas, if you use a binary search, you'd do 600,000 * 5e6 * log 5e6 = O(20,096,910,013,008)

600,000 * log 5e6 = O(48,494,850)

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
jdporter,
You and BrowserUk are both correct. In the real world problem, anytime there was a change the process had to begin anew. In my "snapshot in time" use case, I only had to run it once. My goal was to demonstrate that changing the behavior of the vendor provided C compiled stored procedure was to our advantage. Everyone was convinced but after a year, I am still waiting to see the change.

Cheers - L~R

Create A New User
Node Status?
node history
Node Type: note [id://817503]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (3)
As of 2021-06-12 23:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
What does the "s" stand for in "perls"? (Whence perls)

Results (53 votes). Check out past polls.

Notices?