My expectation is that most databases would use a well-known datastructure (such as a BTree) to store this kind of data. Which avoids a million directory entries, and also allows for variable length data. I admit that an RDBMS might do this wrong. But I'd expect most of them to get it right first try. Certainly BerkeleyDB will.
As for the "file with big holes" approach, only some filesystems implement that. Furthermore depending on how Perl was compiled and what OS you're on, you may have a fixed 2 GB limit on file sizes. With real data, that is a barrier that you're probably not going to hit. With your approach, the file's size will always be a worst case. (And if your assumption on the size of a record is violated, you'll be in trouble - you've recreated the problem of the second situation that you complained about in point 1.)
I'd also be curious to see the relative performance with real data between, say, BerkeleyDB and "big file with holes". I could see it coming out either way. However I'd prefer BerkeleyDB because I'm more confident that it will work on any platform, because it is more flexible (you aren't limited to numerical offsets) and because it doesn't have the record-size limitation.
Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
Want more info? How to link or
or How to display code and escape characters
are good places to start.