Hmm... looks like I forgot to mention that using the iterator would be nice, rather than slurping it all into a hash. The iterator also guarantees the order (dates in order, woody words before tinny), being from a database and all. Which would have much reduced the sorting need I hope.
Anyway, I've yet to study your code properly, but right now I'm thinking of wrapping my iterator into a second iterator that buffers a day's worth of words. (Dominus's book seems to have had an effect on me.) It's not the nicest answer but one of the easier ones I guess.