in reply to how to merge many files of sorted hashes?
if the goal is to have everything in one big file ordered by key, and you have several smaller 500MB files already ordered by key, then you just want to do a straight merge:
- open all of the files
- read a key from each file
- sort the filehandles by key
- from the file corresponding to the smallest key,
- read that value,
- copy the key and value to the output
- read the next key from the same file,
- move that filehandle to the place on filehandle list corresponding to the new key (or just sort the list again, if it's really short)
- go back to 4 and repeat until all of the files are exhausted.
On the other hand, I'm still not clear on why you'd want everything in one file; much depends on how you're going to be using this file thereafter.
You may do just as well to, instead of copying the value out in step 6, just call tell() to get a disk position and record that instead. That way you can have a master file that associates every key with a disk position and a value from 1..n indicating which file it is, and then you're not having to copy any files at all.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: how to merge many files of sorted hashes?
by andromedia33 (Novice) on Feb 03, 2012 at 01:14 UTC | |
by wrog (Friar) on Feb 03, 2012 at 07:59 UTC | |
by andromedia33 (Novice) on Feb 03, 2012 at 16:35 UTC | |
by wrog (Friar) on Feb 03, 2012 at 22:47 UTC | |
by andromedia33 (Novice) on Feb 03, 2012 at 23:54 UTC |
In Section
Seekers of Perl Wisdom