http://www.perlmonks.org?node_id=345208


in reply to Perl Internals: Hashes

First the easy question: hashes are a kind of associative array.

The hash code can be found in hv.c in the perl distribution. The routine

HV * Perl_newHV(pTHX) { register HV *hv; register XPVHV* xhv; hv = (HV*)NEWSV(502,0); sv_upgrade((SV *)hv, SVt_PVHV); xhv = (XPVHV*)SvANY(hv); SvPOK_off(hv); SvNOK_off(hv); #ifndef NODEFAULT_SHAREKEYS HvSHAREKEYS_on(hv); /* key-sharing on by default */ #endif xhv->xhv_max = 7; /* HvMAX(hv) = 7 (start with 8 buckets +) */ xhv->xhv_fill = 0; /* HvFILL(hv) = 0 */ xhv->xhv_pmroot = 0; /* HvPMROOT(hv) = 0 */ (void)hv_iterinit(hv); /* so each() will start off right */ return hv; }
creates a new hash. We can see that the basic memory needed is 502 bytes and intially, 8 buckets are created.

For growing a hash as elements are added, the hashing algorithm is a one-at-a-time method. To grow the hash, I think the relevant routine is

STATIC void S_more_he(pTHX) { register HE* he; register HE* heend; XPV *ptr; New(54, ptr, 1008/sizeof(XPV), XPV); ptr->xpv_pv = (char*)PL_he_arenaroot; PL_he_arenaroot = ptr; he = (HE*)ptr; heend = &he[1008 / sizeof(HE) - 1]; PL_he_root = ++he; while (he < heend) { HeNEXT(he) = (HE*)(he + 1); he++; } HeNEXT(he) = 0; }
which looks like a linear algorithm to me.

-Mark

Replies are listed 'Best First'.
Re: Re: Perl Internals: Hashes
by hv (Prior) on Apr 14, 2004 at 22:11 UTC

    ... hv = (HV*)NEWSV(502,0); ...

    ... creates a new hash. We can see that the basic memory needed is 502 bytes

    That's not correct: the first parameter to NEWSV is an identifier, used only for malloc debugging. It is the second parameter that provides the size, and a value of zero means "give me a new empty SV - I'll fill it in myself".

    It is the following call to sv_upgrade((SV *)hv, SVt_PVHV) that actually turns this into something capable of storing a hash. In particular it allocates the struct xpvhv, defined in hv.h:

    struct xpvhv { char * xhv_array; /* pointer to malloced string */ STRLEN xhv_fill; /* how full xhv_array currently is */ STRLEN xhv_max; /* subscript of last element of xhv_array */ IV xhv_keys; /* how many elements in the array */ NV xnv_nv; /* numeric value, if any */ #define xhv_placeholders xnv_nv MAGIC* xmg_magic; /* magic for scalar array */ HV* xmg_stash; /* class package */ I32 xhv_riter; /* current root of iterator */ HE *xhv_eiter; /* current entry of iterator */ PMOP *xhv_pmroot; /* list of pm's for this package */ char *xhv_name; /* name, if a symbol table */ };

    Also the number of buckets does increase in powers of 2, as shown by the example code in demerphq's response. S_more_he() is not relevant to this - it is (I think) fetching a new structure from a malloc arena. The routine you want is S_hsplit() which quite early on does a nice clear:

    register I32 newsize = oldsize * 2;

    Hugo