Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

Re: Binary searching with Perl.

by Dominus (Parson)
on May 25, 2003 at 04:47 UTC ( #260659=note: print w/replies, xml ) Need Help??

in reply to Binary searching with Perl.

As more food for thought, I'll present an application of binary search where it doesn't make sense to 'use a hash'. Suppose you're looking up records in a large sorted file. The following search subroutine will search the file efficiently, using a binary search. It gets two arguments: $fh is a filehandle open to the file you want to search, and $cmp is a comparison function. search will call $cmp repeatedly with various records from the file; $cmp should return zero if its argument is the desired record, negative if its argument is earlier than the desired record, and positive if its argument is later than the desired record.

search then returns the first record in the file that is not earlier than the desired record, and leaves the filehandle positioned at that record. If there is no such record, it returns undef.

sub search { my ($fh, $cmp) = @_; my ($lo, $hi) = (0, -s $fh); while (1) { my $mid = int(($lo + $hi)/2); if ($mid) { seek $fh, $mid-1, SEEK_SET; my $junk = <$fh>; } else { seek $fh , $mid, SEEK_SET; } my $start = tell $fh; my $rec = <$fh>; return unless defined $rec; chomp($rec); if ($hi == $lo) { seek $fh, $start, SEEK_SET; return $rec }; local $_ = $rec; if ($cmp->($rec) < 0) { $lo = $mid+1 } else { $hi = $mid } } }
For example, suppose you want to search the dictionary file for words that begin with foo. You might:
open DICT, "<", "/usr/dict/Web2" or die...; if (search(\*DICT, sub { lc $_ cmp "foo" })) { # DICT is now positioned at the first word beginning with 'foo' while (<DICT>) { last unless /^foo/i; print; } }
The search call moves the filehandle quickly to the first record that begins with foo, and then the while loop prints out the words, quitting after the last one. Using search is quicker than scanning the file from the beginning looking for the first foo word, unless the file is very small.

The behavior is similar to that of the standard Search::Dict module, but the code is much simpler, and I think my interface is more flexible. I wrote this as an example for my new class Strategies for Lightweight Databases. Share and enjoy!

Mark Dominus
Perl Paraphernalia

Replies are listed 'Best First'.
Re: Re: Binary searching with Perl.
by tilly (Archbishop) on May 25, 2003 at 04:51 UTC
    My inclination would still be to use DB_File or a relative to access a BTREE. One of the methods in the API is exactly equivalent to your search interface, but the implementation should be more performance-efficient. (And using their API takes less code.)
      Says tilly:
      My inclination would still be to use DB_File or a relative to access a BTREE.
      That's a good solution in some cases, but not always. The world is full of sorted text files, and it doesn't always make sense to manufacture a much larger secondary file, which must be maintained in parallel with the text file, just to perform searches on the data.

      Mark Dominus
      Perl Paraphernalia

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://260659]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2018-03-24 16:04 GMT
Find Nodes?
    Voting Booth?
    When I think of a mole I think of:

    Results (299 votes). Check out past polls.