Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re: "Just use a hash": An overworked mantra?

by RichardK (Priest)
on Nov 17, 2011 at 09:36 UTC ( #938564=note: print w/ replies, xml ) Need Help??


in reply to "Just use a hash": An overworked mantra?

But, if I can quote from Donald_Knuth "Premature optimization is the root of all evil."

You need to profile the real code to find out where the bottleneck is, As BrowserUK points out the time to read the file swamps any sort of improvement you might make.

Using a hash as a default option is still probably the right one, it's simple, understandable and works when your data is strings, or dates or anything else. An array on the other hand can only be used when your data is an integer, and you know that the range of the data is small. So in this very specific case a array will be better but not in the general case.

Thanks for the interesting post, but to answer your question -- NO it's not! :)


Comment on Re: "Just use a hash": An overworked mantra?
Re^2: "Just use a hash": An overworked mantra?
by blakew (Monk) on Nov 17, 2011 at 18:00 UTC
    "An array on the other hand can only be used when your data is an integer"

    I think you meant "maps 1:1 with integers."

      In this case, "data" is a bunch of integers. In moving from a hash to an array, the "keys" do have to be integers. I you're just counting, nothing else matters, but if it is about key-value pairs, that move is still valid if just the key is a (positive) integer. The value(s) in that pair do not have to be.

      Another thing not yet mentioned is that with datasets this large, not only the data itself may put a limit on the internal available memory footprint, but the overhead in perl structures add to that. Just today I checked what the internal representation of a 1 Mb .csv file was represented as an array(ref) of array(ref)s: it grew to 10Mb! A hash takes slightly more overhead than an array (most overhead goes into converting a single number into a refcounted SV), so when on the verge of swapping, an array might actually be much faster than a hash.


      Enjoy, Have FUN! H.Merijn
        Your data can be characters; in which case use ord to map to integers for the key. The point is your data just needs to be mappable to integers, not necessary integers themselves.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://938564]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (7)
As of 2014-12-28 08:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (179 votes), past polls