Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

Re^2: Hash lookups, Database lookups, and Scalability

by jplindstrom (Monsignor)
on Oct 31, 2004 at 15:11 UTC ( #404134=note: print w/replies, xml ) Need Help??

in reply to Re: Hash lookups, Database lookups, and Scalability
in thread Hash lookups, Database lookups, and Scalability

Also, if the table has a clustered index (Sybase/SQL Server) or the table is index organized (Oracle) or somesuch, there is no extra disk IO to read the row data.

(From your description with 2 physical reads, it sounds like you used a clustered PK index.)

  • Comment on Re^2: Hash lookups, Database lookups, and Scalability

Replies are listed 'Best First'.
Re^3: Hash lookups, Database lookups, and Scalability
by mpeppler (Vicar) on Oct 31, 2004 at 15:53 UTC
    There are various issues that enter into calculating the number of IOs. If the index covers the query (i.e. all the columns that are required for the output are part of the index), then only the index page/row needs to be read.

    If a clustered index is used, then with Sybase 11.9.2 and later if the table uses "all pages" locking then the leaf index page and the data page are one and the same, but if the table uses "data only" locking (commonly referred to as DOL) then the index leaf pages and the data pages are separate.

    Which all means that estimating the number of IOs for a particular query can be a lot of fun, and is the reason why cost-based optimizers (such as the one Sybase uses) are pretty complicated beasts...


      I see that a properly constructed index is going to be vital for optimum performance. Given the criteria of the dual crossreferenced lookups, how might I better construct the indices in the code I posted? I'm curious to see if the DB solution can be better optimized.


        Indexes and query behavior is usually pretty tied to the way a particular database engine works - and I don't know SQLlite at all, so I can't really help you with specifics.

        However, your table schema is exceedingly simple, so you really only have two choices:

        create unique index left_ix on words(left)
        create unique index left_ix on words(left, right)
        (and their opposites).

        The first form is more "correct" - you really only want the key in the index. The second form may give you slightly better performance, at the expense of allowing duplicate "left" words into the table as long as they point at a different "right" word, and slightly more work during inserts (index maintenance is a little more complicated).

        Personally I'd use the first form (index on "left", and a separate index on "right").


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://404134]
and !@monks...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (6)
As of 2017-01-21 23:45 GMT
Find Nodes?
    Voting Booth?
    Do you watch meteor showers?

    Results (186 votes). Check out past polls.