My expectation is that most databases would use a well-known datastructure (such as a BTree) to store this kind of data. Which avoids a million directory entries, and also allows for variable length data. I admit that an RDBMS might do this wrong. But I'd expect most of them to get it right first try. Certainly BerkeleyDB will.
As for the "file with big holes" approach, only some filesystems implement that. Furthermore depending on how Perl was compiled and what OS you're on, you may have a fixed 2 GB limit on file sizes. With real data, that is a barrier that you're probably not going to hit. With your approach, the file's size will always be a worst case. (And if your assumption on the size of a record is violated, you'll be in trouble - you've recreated the problem of the second situation that you complained about in point 1.)
I'd also be curious to see the relative performance with real data between, say, BerkeleyDB and "big file with holes". I could see it coming out either way. However I'd prefer BerkeleyDB because I'm more confident that it will work on any platform, because it is more flexible (you aren't limited to numerical offsets) and because it doesn't have the record-size limitation.