|Just another Perl shrine|
Data structures benchmark(pack vs. arrays vs. hashes vs. strings)by spx2 (Deacon)
|on Dec 09, 2011 at 23:38 UTC||Need Help??|
spx2 has asked for the
wisdom of the Perl Monks concerning the following question:
Statement of the problem
Basically I'm in a situation where I need to find a fast way to serialize/deserialize many arrays of 6 elements each. These arrays have the same semantics inside, I'm just supposed to transform it into some form so I can pass it to other methods/modules etc.
The problem is I need to do this faster than the current implementation(splits a string which contains numbers, puts those numbers into the values of a hash).
At the moment these arrays of 6 elements are from some data source where they come as comma-separated strings, so a split(/,/,$string) can't be avoided.
I've identified that these are the data types that I'll be getting.
Basically the data which needs to be serialized is consisting of 2integers,an unsigned long,3 double precision floats, in this order.
So I get a string like that, I decompose it into bits and pieces, serialize it, add it to an array and after I processed all the strings, I return that big array. Of course, on the other end, I'm getting the array and now I need to deserialize in order to access the values.
Now the question is, which is the fastest way to serialize/deserialize data. So what I did is I made a benchmark to see exactly how fast each is. I was interested in how fast the creation of the object was(serialization), and how fast access to the object was(which in some cases required deserialization).
I'm gonna add the code which I wrote to benchmark this, but this has left me with some unanswered questions:
I am asking this because when I first started doing the benchmark I was thinkg "Oh boy! I'm gonna optimize this thing by using (un)pack for (de)serialization instead of hashes or arrays" so that's what I was expecting and what I got is waaay off my initial expectation and I don't know where I'm wrong.
Possible improvements(haven't tried them)
I also thought of an alternative to all of this, that's to change the code that generates the data from the data source so that it gives data of fixed size in binary format instead of text.
I also thought about ditching unpack for a homebrewed much more limited XS version of it(haven't worked on it yet). Would it, in principle, be possible to write something than the current unpack ?
user@garage:~$ perl pack_vs_hash_pm.pl Rate construct_comma_separated construct_hash construct_packed construct_array construct_comma_separated 699301/s -- -13% -43% -70% construct_hash 800000/s 14% -- -35% -66% construct_packed 1234568/s 77% 54% -- -47% construct_array 2325581/s 233% 191% 88% -- (warning: too few iterations for a reliable count) Rate access_comma_separated access_packed access_hash access_array access_comma_separated 970874/s -- -23% -33% -69% access_packed 1265823/s 30% -- -13% -59% access_hash 1449275/s 49% 14% -- -54% access_array 3125000/s 222% 147% 116% --