http://www.perlmonks.org?node_id=942752


in reply to Re: Data structures benchmark(pack vs. arrays vs. hashes vs. strings)
in thread Data structures benchmark(pack vs. arrays vs. hashes vs. strings)

the purpose is to take a string, split it and store it in memory in such a way that you can pass it around and not need to split it again when receiving it in some other part of the program.

Replies are listed 'Best First'.
Re^3: Data structures benchmark(pack vs. arrays vs. hashes vs. strings)
by BrowserUk (Patriarch) on Dec 10, 2011 at 01:08 UTC
    the purpose is to take a string, split it and store it in memory in such a way that you can pass it around and not need to split it again when receiving it in some other part of the program.

    Then nothing will be as fast as constructing an array of arrays and passing a reference to it around. It could not be so.

    Reading between the lines, your main problem seems to be that yoo are inisting on copying the subarrays to local named scalars each time before using them, rather than just using them in-situ.

    Ie. You are doing something like:

    sub process { my( $AoA, $thingToProcess ) = @_; my( $v1, $v2, $v3, $v4, $v5, $v6, $v7 ) = @{ $AoA->[ $thingToProce +ss ] }; my( $r1, $r2, $r3, $r4, $r5, $r6, $r7 ) = ( ... some calculation(s +) involving $v1, $v2, $v3, $v4, $v5, $v6, $v7 ... ); @{ $AoA->[ $thingToProcess ] } = ( $r1, $r2, $r3, $r4, $r5, $r6, $ +r7 ); return; }

    When you could be doing:

    sub process { my( $AoA, $thingToProcess ) = @_; $AoA->[ $thingToProcess ][ 3 ] = $AoA->[ $thingToProcess ][ 1 ] * + $AoA->[ $thingToProcess ][ 2 ]; ... return; }

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      this is correct. however, writing each time $AoA->[ $thingToProcess ][ something ] could lead to hard to understand code.

      also, if in the benchmarks, I use every time the "copying the subarrays to local named scalars" , these cancel themselves out, so basically the benchmark is still valid from this point of view, do you agree ?

        also, if in the benchmarks, I use every time the "copying the subarrays to local named scalars" , these cancel themselves out, so basically the benchmark is still valid from this point of view, do you agree ?

        Not really, no. The problem is you have equations something like:

        (call_time=1000) + (allocate_names=150) + (copy_values=250) + (extra_b +it_a=15) versus (call_time=1000) + (allocate_names=150) + (copy_values=250) + (the_ext +ra_bit=5)

        The extra bit is so small relative to the set-up and tear-down, you cannot accurately instrument the differences you are interested in. They just get mixed up in the noise of the overheads

        this is correct. however, writing each time $AoA->[ $thingToProcess ][ something ] could lead to hard to understand code.

        I sympathise with this. In this case I would construct the code differently. Instead of calling the subroutines as:

        sub process { my( $AoA, $thingToProcess ) = @_; $AoA->[ $thingToProcess ][ 3 ] = $AoA->[ $thingToProcess ][ 1 ] * $AoA->[ $thingToProcess ][ 2 ]; ... return; } ... process( $AoA, 123 );

        Do it this way:

        ## Use meaningful names obviously!! use constant { 0 => FIRST, 1 => SECOND, 2 => THIRD, 3 => FOURTH, 4 => FIFTH, 5 => SIXTH, 6 => SEVENTH }; sub process( our @s; local *s = shift; $s[ FOURTH ] = $s[ SECOND ] + $s[ THIRD ]; ... return; } process( $AoA[ $thingToProcess ] );

        The first (our) line allows us to use the global variable locally.

        The second line (local) aliases a local copy of the global variable to the sub array within the external @AoA

        The use constant gives us meaningful names for the subarray elements.

        The effect is direct, in-situ access to the subarrays without the need to copy and via short, meaningful names.

        • Aliasing is a very cheap operation -- just a pointer assigned.
        • All data copying is avoided.
        • Short, meaningful names.
        • Constants are resolved at compile time making access very fast.

          Real constants that is! Don't be fooled by the crass, laborious & slow, oxymoronic poor substitutes of "ReadOnly variables".

        The results is clean, safe and very readable and maintainable code that is also efficient.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

Re^3: Data structures benchmark(pack vs. arrays vs. hashes vs. strings)
by AnomalousMonk (Archbishop) on Dec 10, 2011 at 01:03 UTC

    But if you are going to be operating entirely within a program and not, e.g., storing to disk or transmitting to another computer over a network, why not just pass array or hash references (or object references) around? Again, I don't understand the application.

      there's a big amount of strings that need to be processed(the split and store in some data structure thing), so storing it in a hash is not very optimal, turns out from the benchmarks an array would be faster. but I was expecting pack/unpack to be faster since it just packs it into a very simple datastructure and then unpacks it the same way, not needing so complex data structures such as hashes/arrays, instead.. it would just store everything as binary data. so that's why I was expecting that to be faster.