Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

Re: Optimizing Iterating over a giant hash

by oron (Novice)
on Dec 30, 2009 at 16:58 UTC ( #814967=note: print w/replies, xml ) Need Help??

in reply to Optimizing Iterating over a giant hash

OK, problem solved :)
first of all thanks for all the answers.

What I eventually did was some version of divide and conquer. When reading the inital data (filling the hash) I instead wrote it in a more suitable way for a new file that was then sorted (shamefully - with the linux sort util) and then i could process the lines that started out the same (same entry + timestamp) and out put. this also gave me a sorted result.

I liked the idea of seperating the internal hash to a list - this actually might decrease lookups and not run out of memory for those lists which are relatively short.

I did not use a database because i was under the impression that i need an sql server (for example) to be running and i don't have one. am i wrong? this could be usefull...
  • Comment on Re: Optimizing Iterating over a giant hash

Replies are listed 'Best First'.
Re^2: Optimizing Iterating over a giant hash
by GrandFather (Sage) on May 23, 2010 at 20:36 UTC

    I strongly recommend you have a play with SQLite (DBD::SQLite) to dip your toe into the database waters. It is completely stand alone, even to the extent that the DBD 'driver' includes the database engine. It is ideal for the sort of application this threat discusses (although I'm not recommending you re-engineer your current solution). Having database tools in your toolbox is pretty likely to be useful to you, to the extent that having a bit of a play now is likely to pay dividends in the longer term.

    True laziness is hard work
Re^2: Optimizing Iterating over a giant hash
by patrick.j (Acolyte) on May 23, 2010 at 17:19 UTC
    There is, on the one hand, a Core-Module Called FileDB, wich enables you to ,simply saying, put a HASH on disk, but every access to the HASH then is a Filesystem-I/O operation (wich means it is slow). There is another Module at CPAN, called BerkeleyDB, but the documentation has many TODOs in it.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://814967]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (4)
As of 2018-04-22 13:04 GMT
Find Nodes?
    Voting Booth?