Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

Re: Hash key or value string reuse?

by jethro (Monsignor)
on Jun 15, 2009 at 23:32 UTC ( #771829=note: print w/replies, xml ) Need Help??

in reply to Hash key or value string reuse?

A bitstring that can't be represented numerically? Any bitstring can be *represented* numerically. I guess you mean your bitstrings are stored as binary data which can't be printed as is without producing garbage on the screen

To answer your question, no, perl doesn't do memory optimization for hashes. There might be a module for that, but I doubt the general case is so easy to deal with

I think an example would have helped. Do the similarities in strings start at the same position in the bitstring? Or do similarities at least start at the same position, but the position varies? If they are not encoded as raw binary data, how are they encoded?

Generally your problem is to find common subexpressions fast. If a very suboptimal packing is enough you could pack always from the start of a bitstring and always the same length. Then a simple HashOfHashes can use all the common subexpressions as keys and the subtree as values. Even a few such HoHs with different lengths and starting bits would be possible. But then you either have to be careful to always add to and search in them in the same sequence or can't short circuit a search when you found a subexpression and no value.

You probably found above solution already yourself, but when giving general information you usually get general solutions

Replies are listed 'Best First'.
Re^2: Hash key or value string reuse?
by QM (Parson) on Jun 16, 2009 at 15:31 UTC
    A bitstring that can't be represented numerically?
    I didn't say it well, but I meant that it can't be represented in native machine format, at least without some arbitrary precision extension. I could use vec, but didn't do so as it would require converting from string to vec and back, with the vec version just for storage (no direct bit lookup is needed). So they are stored as strings, which in Perl are arbitrary "precision".

    Perhaps I lead you astray, as I wasn't looking for any encoding or compression-type solution. The number of unique strings isn't so large, but the total number of strings stored in the hash is. So if there's a way to store the actual string once, and either Perl or the code points to it as needed, that would be great.

    Quantum Mechanics: The dreams stuff is made of

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://771829]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (4)
As of 2018-01-21 05:55 GMT
Find Nodes?
    Voting Booth?
    How did you see in the new year?

    Results (227 votes). Check out past polls.