Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

perl hashes

by Anonymous Monk
on Jul 14, 2009 at 16:25 UTC ( [id://779978]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

hi Monks,
how many entries i can store in hash table.
what is the performance if i store the more than 50 thousand records

Replies are listed 'Best First'.
Re: perl hashes
by davorg (Chancellor) on Jul 14, 2009 at 16:27 UTC
    how many entries i can store in hash table.

    That is determined by the size of your entries and the amount of memory in your computer.

    what is the performance if i store the more than 50 thousand records

    That is largely determined by the speed of the processor in your computer.

    For a data set of that size, you should probably be thinking about a database.


    See the Copyright notice on my home node.

    Perl training courses

Re: perl hashes
by ikegami (Patriarch) on Jul 14, 2009 at 16:35 UTC

    how many entries i can store in hash table

    There's practically no limit. You'll run out of memory first.

    what is the performance if i store the more than 50 thousand records

    Lookups, insertions, deletions are not dependent on the size of the hash.

Re: perl hashes
by jrsimmon (Hermit) on Jul 14, 2009 at 18:30 UTC
    Your performance is determined by the type of operations you're performing. If you are not scanning keys (ie, my @keys = keys(%bigHash); such that you are always looking up values with a previously known key (ie, my $value = $bigHash{$key};), then a hash maintains its performance quite well even when it is very large and, as stated early, memory/resource usage is your greatest concern.

    That said, Benchmark can help you determine the performance of the hash at different sizes.
Re: perl hashes
by roboticus (Chancellor) on Jul 15, 2009 at 00:30 UTC

    Your questions don't have enough information to be able to give good answers. As asked, the best answers I can give you are:

    how many entries i can store in hash table.

    All of them, unless you run out of memory.

    what is the performance if i store the more than 50 thousand records

    It depends on what you're measuring.

    Uh, how do you define performance? Normally people talk about time when they talk about performance, but it's by no means the only one available.

    Having given you some useless answers, here are

    As has been mentioned earlier in the thread, a hash table is limited by the amount of memory you have. (Disk space in some cases, since virtual memory will let you spill over to the disk drive.) The OS will consume some of your memory, and perl doesn't store all the items in a hash table all packed together. Generally, when I'm estimating, I find that if the number of items I want to store times the size of the items is more than 1/5 of my RAM, then I start to worry about it. Otherwise, I don't even think about it.

    Regarding performance, I find that it's rarely worth the time to discuss the performance of a language feature. Computers are so fast now, that even slow horrible algorithms are often just fine. Normally I think about performance only if I have actual code that performs too slowly to meet requirements. Then I don't worry about guessing, it's time to measure the code to find out where it's slow. Once you do that, that's the time to start asking questions about the performance of certain code constructs.


    ...who has been a little snarkier than usual lately, possibly as a result of having to switch to these consarned bifocals!

Re: perl hashes
by Marshall (Canon) on Jul 15, 2009 at 07:03 UTC
    I routinely use hashes of 50K or 100K keys and sometimes several at once! This is not an issue if you have enough memory. Also, the Perl hash table is very efficient performance wise. If you know what you are looking for, no other data structure is as efficient.

    I suspect that your trouble will be the same as mine. Where does this stuff that goes into the hash come from? In my case the data comes from text files and the "care and feeding" of these text files just dwarfs any hash table initialization or should I say that getting the data off the disk and ready to be inserted into the hash and creating the text output files is what takes the VAST majority of the MIPs, split(), regex and such. I/O is "expensive". Anyway 50K keys is not what I would consider a huge hash. If your app gets slow, then look at your I/O stuff and benchmark it. You can make some significant performance gains there with some experimentation.

    Anyway I/O is gonna be the performance problem, not the hash itself.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://779978]
Approved by davorg
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (4)
As of 2024-04-20 04:15 GMT
Find Nodes?
    Voting Booth?

    No recent polls found