Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Re^3: "Just use a hash": An overworked mantra?

by Tux (Monsignor)
on Nov 17, 2011 at 18:38 UTC ( #938659=note: print w/ replies, xml ) Need Help??

in reply to Re^2: "Just use a hash": An overworked mantra?
in thread "Just use a hash": An overworked mantra?

In this case, "data" is a bunch of integers. In moving from a hash to an array, the "keys" do have to be integers. I you're just counting, nothing else matters, but if it is about key-value pairs, that move is still valid if just the key is a (positive) integer. The value(s) in that pair do not have to be.

Another thing not yet mentioned is that with datasets this large, not only the data itself may put a limit on the internal available memory footprint, but the overhead in perl structures add to that. Just today I checked what the internal representation of a 1 Mb .csv file was represented as an array(ref) of array(ref)s: it grew to 10Mb! A hash takes slightly more overhead than an array (most overhead goes into converting a single number into a refcounted SV), so when on the verge of swapping, an array might actually be much faster than a hash.

Enjoy, Have FUN! H.Merijn

Comment on Re^3: "Just use a hash": An overworked mantra?
Replies are listed 'Best First'.
Re^4: "Just use a hash": An overworked mantra?
by blakew (Monk) on Nov 17, 2011 at 19:47 UTC
    Your data can be characters; in which case use ord to map to integers for the key. The point is your data just needs to be mappable to integers, not necessary integers themselves.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://938659]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (6)
As of 2015-10-07 22:11 GMT
Find Nodes?
    Voting Booth?

    Does Humor Belong in Programming?

    Results (202 votes), past polls