Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Data structures benchmark(pack vs. arrays vs. hashes vs. strings)

by spx2 (Chaplain)
on Dec 09, 2011 at 23:38 UTC ( #942747=perlquestion: print w/ replies, xml ) Need Help??
spx2 has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks,

Statement of the problem

Basically I'm in a situation where I need to find a fast way to serialize/deserialize many arrays of 6 elements each. These arrays have the same semantics inside, I'm just supposed to transform it into some form so I can pass it to other methods/modules etc.

The problem is I need to do this faster than the current implementation(splits a string which contains numbers, puts those numbers into the values of a hash).

At the moment these arrays of 6 elements are from some data source where they come as comma-separated strings, so a split(/,/,$string) can't be avoided.

I've identified that these are the data types that I'll be getting.

Basically the data which needs to be serialized is consisting of 2integers,an unsigned long,3 double precision floats, in this order.

So I get a string like that, I decompose it into bits and pieces, serialize it, add it to an array and after I processed all the strings, I return that big array. Of course, on the other end, I'm getting the array and now I need to deserialize in order to access the values.

Now the question is, which is the fastest way to serialize/deserialize data. So what I did is I made a benchmark to see exactly how fast each is. I was interested in how fast the creation of the object was(serialization), and how fast access to the object was(which in some cases required deserialization).

Questions

I'm gonna add the code which I wrote to benchmark this, but this has left me with some unanswered questions:

  1. Why is packing and returning six values slower than constructing a whole Perl AV data structure, creating each SvPV for each position, copying all the elements inside it and then returning that ? (compare the times of construct_packed with construct_array)
  2. Why is unpacking a blob of 6 values to 6 different SV structures(among which 2 SvIV and 3 SvPVNV and one SvUV ) slower than dereferencing a whole array and copying contents to 6 different scalars(the same types mentioned before) ? (compare times of access_array and access_packed)
  3. Why is accessing a hash faster than just unpacking a string filled with data and accessing the contents ? (compare access_packed and access_hash)

I am asking this because when I first started doing the benchmark I was thinkg "Oh boy! I'm gonna optimize this thing by using (un)pack for (de)serialization instead of hashes or arrays" so that's what I was expecting and what I got is waaay off my initial expectation and I don't know where I'm wrong.

Possible improvements(haven't tried them)

I also thought of an alternative to all of this, that's to change the code that generates the data from the data source so that it gives data of fixed size in binary format instead of text.

I also thought about ditching unpack for a homebrewed much more limited XS version of it(haven't worked on it yet). Would it, in principle, be possible to write something than the current unpack ?

Benchmark result

user@garage:~$ perl pack_vs_hash_pm.pl
                               Rate construct_comma_separated construct_hash construct_packed construct_array
construct_comma_separated  699301/s                        --           -13%             -43%            -70%
construct_hash             800000/s                       14%             --             -35%            -66%
construct_packed          1234568/s                       77%            54%               --            -47%
construct_array           2325581/s                      233%           191%              88%              --
            (warning: too few iterations for a reliable count)
                            Rate access_comma_separated access_packed access_hash access_array
access_comma_separated  970874/s                     --          -23%        -33%         -69%
access_packed          1265823/s                    30%            --        -13%         -59%
access_hash            1449275/s                    49%           14%          --         -54%
access_array           3125000/s                   222%          147%        116%           --


Code

#!/usr/bin/perl use strict; use warnings; use Benchmark qw/cmpthese/; ################## # SERIALIZING # ################## cmpthese( 1_000_000, { construct_hash => sub { my @a = (1..6); return +{ k1 => $a[0], k2 => $a[1], k3 => $a[2], k4 => $a[3], k5 => $a[4], k6 => $a[5] } }, construct_packed => sub { my @a = (1..6); return pack('sI2Lddd', 6, @a );# ignore the front 's' , I'm usi +ng that for something else }, # old cache construct_comma_separated => sub { my @a = (1..6); return "$a[0],$a[1],$a[2],$a[3],$a[4],$a[5]"; }, construct_array => sub { my @a = (1..6); return \@a; }, }); ################## # DESERIALIZING # ################## my @access_vals = 1..6; my $access_vals_ref = \@access_vals; my $access_h = +{ k1 => $access_vals[0], k2 => $access_vals[1], k3 => $access_vals[2], k4 => $access_vals[3], k5 => $access_vals[4], k6 => $access_vals[5] }; my $access_packed = pack("sI2Lddd",@access_vals); # ignore the front ' +s' , I'm using that for something else my $access_comma_separated = "3411,2,3,45,5.33,6.12"; cmpthese( 1_000_000 , { access_hash => sub { my $k1 = $access_h->{k1}; my $k2 = $access_h->{k2}; my $k3 = $access_h->{k3}; my $k4 = $access_h->{k4}; my $k5 = $access_h->{k5}; my $k6 = $access_h->{k6}; }, access_packed => sub { my ($values_packed,$k1,$k2,$k3,$k4,$k5,$k6)=unpack('sI2Lddd',$ac +cess_packed);# ignore the front 's' , I'm using that for something el +se }, access_comma_separated => sub { my ($k1,$k2,$k3,$k4,$k5,$k6) = split(/,/,$access_comma_separated +); }, access_array => sub { my ($k1,$k2,$k3,$k4,$k5,$k6) = @$access_vals_ref; }, });

Comment on Data structures benchmark(pack vs. arrays vs. hashes vs. strings)
Select or Download Code
Re: Data structures benchmark(pack vs. arrays vs. hashes vs. strings)
by AnomalousMonk (Monsignor) on Dec 10, 2011 at 00:47 UTC

    I have serious, basic confusion about your question. According to my understanding of Serialization or Marshalling, only the pack/unpack approach you show qualifies at all. Creating an anonymous array or hash and then accessing an element or elements of the array or hash does not qualify because it cannot be done, e.g., over a network or with an intervening step of storage to a disk.

    Can you clarify your definitions and/or application?

      the purpose is to take a string, split it and store it in memory in such a way that you can pass it around and not need to split it again when receiving it in some other part of the program.

        But if you are going to be operating entirely within a program and not, e.g., storing to disk or transmitting to another computer over a network, why not just pass array or hash references (or object references) around? Again, I don't understand the application.

        the purpose is to take a string, split it and store it in memory in such a way that you can pass it around and not need to split it again when receiving it in some other part of the program.

        Then nothing will be as fast as constructing an array of arrays and passing a reference to it around. It could not be so.

        Reading between the lines, your main problem seems to be that yoo are inisting on copying the subarrays to local named scalars each time before using them, rather than just using them in-situ.

        Ie. You are doing something like:

        sub process { my( $AoA, $thingToProcess ) = @_; my( $v1, $v2, $v3, $v4, $v5, $v6, $v7 ) = @{ $AoA->[ $thingToProce +ss ] }; my( $r1, $r2, $r3, $r4, $r5, $r6, $r7 ) = ( ... some calculation(s +) involving $v1, $v2, $v3, $v4, $v5, $v6, $v7 ... ); @{ $AoA->[ $thingToProcess ] } = ( $r1, $r2, $r3, $r4, $r5, $r6, $ +r7 ); return; }

        When you could be doing:

        sub process { my( $AoA, $thingToProcess ) = @_; $AoA->[ $thingToProcess ][ 3 ] = $AoA->[ $thingToProcess ][ 1 ] * + $AoA->[ $thingToProcess ][ 2 ]; ... return; }

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

Re: Data structures benchmark(pack vs. arrays vs. hashes vs. strings)
by BrowserUk (Pope) on Dec 10, 2011 at 00:56 UTC

    First, I get a different results for the construction with pack coming out fastest. These are the results of your benchmark (unchanged except for shortening the names) on my system:

    c:\test>942747.pl Rate hash comma array packed hash 186995/s -- -27% -76% -77% comma 255572/s 37% -- -68% -69% array 789101/s 322% 209% -- -4% packed 819141/s 338% 221% 4% -- Rate comma packed hash array comma 273531/s -- -62% -70% -86% packed 714529/s 161% -- -21% -63% hash 900220/s 229% 26% -- -53% array 1925183/s 604% 169% 114% --

    Now to your questions:

    1. Why is packing and returning six values slower than constructing a whole Perl AV data structure, creating each SvPV for each position, copying all the elements inside it and then returning that ? (compare the times of construct_packed with construct_array)

      Your construct_array doesn't "Perl AV data structure, creating each SvPV for each position, copying all the elements inside it".

      It simply returns a reference to the existing array that you constructed to hold the values.

      But you also construct that same array inside each of the other construct_* benchmark subs before getting to the additional bit they have to do.

      So the surprise -- on my system -- is that the pack version manages to be quicker. Even though it is only a little, it is consistent.

      Ie. construct_packed() does everything that construct_array() does, and then a little bit more, and still comes out quicker.

      This is a sure sign that you're not benchmarking what you think you are.

    2. Why is unpacking a blob of 6 values to 6 different SV structures(among which 2 SvIV and 3 SvPVNV and one SvUV ) slower than dereferencing a whole array and copying contents to 6 different scalars(the same types mentioned before) ? (compare times of access_array and access_packed)

      Again, both access_array() and access_packed() are allocating six scalars to hold the results and copying the values retrieved into them.

      But access_packed() has also to first to: a) parse the template string; b) allocate scalars (on the stack) to hold the unpacked values; c) unpack those values. How could it be quicker?

    3. Why is accessing a hash faster than just unpacking a string filled with data and accessing the contents ? (compare access_packed and access_hash)

      Again, access_packed() also has to allocate the six (actually seven, but you asked us to ignore that), scalars to hold the results. But whereas access_hash() only has to dereference a hash ref and copy the scalar; access_packed() has to: a) call a subroutine (unpack); b) parse the template; c) allocate 7 more scalars on the stack; d) assign the values extracted from the packed scalar to those stack vars; e) copy the values to the named scalars in the caller.

    The bottom line is that both your benchmark and your expectations are fundamentally flawed.

    Your expectations because you are imagining that pack & unpack should be doing less work, but in fact they are doing more. Accessing native arrays and hashes are "native operations", they don't require calling to a subroutine; setting up and tearing down teh stack frames for that subroutine; or parsing the templates as required by pack & unpack.

    Your benchmark is flawed because you are attempting to benchmark very small operations, but ignoring the fact that all the setup required to benchmark those operations -- initialising the input arrays, calling the benchmark subroutines, constructing the scalars to hold the return values etc. -- is a) common to all the benchmarks; b) swamping the time taken to perform the bit you are trying to test.

    I sympathise with the need to optimise your code, but you are going about it in the wrong way. If you'd care to post the actual code; or perhaps a cut down but working example that demonstrates the part that is taking too long, then you'd likely get some very good help in optimising it. In the not so distant past we've seen one or two orders of magnitude improvements in code posted here, that comes just from doing things slightly differently. (As opposed to major algorithmic improvements which can yield even greater savings.)


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      I am very surprised you got those resuls. I haven't tried the code on Windows, but yes, it's a bit bizarre that only the second report looks the same as on my machine. The targeted operating system for this code is Linux so I think I will try it on multiple Linux machines to see wether there's any difference. But Windows should've had the same result really.. :|

      1. yes that's true, but please take into consideration that if I'm doing that every time, then when comparing the times of the benchmarks, each of them has this operation inside them, so the order is preserved because I added the operation in each of the scenarios. does this sound like a sane assumption ? I hope it does, but I'm not an expert so maybe I'm wrong

      2.

      • a) I fully agree, this might be the main cause. However, in the case of assigning a value to a scalar, isn't there some "typechecking" done to see if it's gonna be stored in a SvIV, SvUV,SvPV,SvPVNV. I'm talking about the fact that Perl would need to guess which type exactly the data is, in order to know what kind of data structure it will allocate for it. Whereas if you use unpack, you already tell it what kind of data it's gonna get, so shouldn't that be faster ?
      • b) this is done in both access_array and access_packed so it should take about the same amount of time in both
      • c)this would just need to do like a memcpy with a given number of bytes from one place to the other, keeping a pointer on the data and just galloping over the data right ? so it shouldn't be so time-consuming

      3. c),d),e) so these are not the ones on the LHS of unpack, these are additional ones for the stack ! hmm, any way to avoid that ?

      An excellent and very thorough analysis . . . up-voted.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://942747]
Approved by davido
Front-paged by chrestomanci
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (6)
As of 2014-08-02 01:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Who would be the most fun to work for?















    Results (53 votes), past polls