anli_:
Considering that the bulk of your data appears to be the text representation of various paths through a taxonomy tree, I thought that you might be able to fit it all into memory (taking advantage of all the redundancy) if you built a pair of trees and connected them together at the leaves. For example, if your data looked like this:
1;2;8;root;xyzzy;cat
1;2;5;root;xyzzy;dog
1;9;root;bird
Then we could build an index tree (on top) and the taxonomy tree (below), tying them together (shown by vertical lines) like this:
1
/ \
/ \
2 \
--^- \
/ \ \
8 5 9
| | |
cat dog bird
\ / /
xyzzy /
\ /
root
If your tree is relatively shallow but broad, you should be able to save a considerable amount of space.
Here's some code that builds the two trees and looks up some of the numeric keys to display the data. Let me know if it does the trick for you, I'd be interested in knowing how much memory it takes to hold your database.
The taxonomy tree has parent links in it to let you get from the leaf to the root of the tree, and the traceback function will walk the tree back to the root for you.
Update: I added a couple comments to the code to clarify it a little.
...roboticus
When your only tool is a hammer, all problems look like your thumb. |