Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^5: Do you really want to use an array there?

by MimisIVI (Acolyte)
on Apr 14, 2008 at 16:57 UTC ( #680338=note: print w/ replies, xml ) Need Help??


in reply to Re^4: Do you really want to use an array there?
in thread Do you really want to use an array there?

First i want to ask you what do you mean with a false impression of it's performance??? I didnt get it..i count the time by using the normal way (time()) ..where is the false perfomance?>??

In your above code you give benchmarks about the packing...what about unpacking??Which is the most important for me..the query responce time..that's what i want to improve ...

Can you give please some benchmarks with the unpacking process,i am still not able to use appropriate this G....t functions(pack/unpack)...



Thanks!!!!


Comment on Re^5: Do you really want to use an array there?
Re^6: Do you really want to use an array there?
by BrowserUk (Pope) on Apr 14, 2008 at 17:36 UTC
    First i want to ask you what do you mean with a false impression of it's performance???

    I wasn't questioning your timing method. My point was that using vec on full 32-bit numbers is directly equivalent to using pack, but slower. Unpacking with vec is much slower:

    C:\test>p1 $packed = pack 'V*', 1 .. 1e6;; cmpthese -5, { unpack => q[ my @nums = unpack 'V*', $packed ], unvec => q[ my @nums; push @nums, vec( $packed, $_, 32 ) for 1 .. + 1e6 ], };; Rate unvec unpack unvec 1.45/s -- -45% unpack 2.64/s 81% --

    But even that belies the real problem. With Elias Gamma, you have to pack/unpack strings of bits that cross byte boundaries, and the only way to do that it to do them 1-bit at a time. That means calling vec for every bit in the string, rather than every 32-bits as you do in your test above. And once you start calling vec for every bit, things slow down dramatically. As you would expect as you are calling the built-in 32 times more frequently. By way of comparison, the equivalent of your test above, but done one bit at a time is:

    use strict; my $wektor = ''; for my $num ( 0 .. 9_999_999 ) { my $binNum = pack 'V', $num; vec( $wektor, $num * 32 + $_, 1 ) = vec( $binNum, $_, 1 ) for 1 .. + 32; } print "Vector's size: " . length( $wektor ) . " bytes\n"; my @vec; my $Aa = time(); for my $num ( 0 .. 9_999_999 ) { my $binNum = ''; vec( $binNum, $_, 1 ) = vec( $wektor, $num * 32 + $_, 1 ) for 1 .. + 32; push @vec, unpack 'V', $binNum; } print "unpack vector in \t", time() - $Aa, " secs...(Oh dear!!!)\n"; __END__ C:\test>junk0 Vector's size: 40000001 bytes unpack vector in 362 secs...(Oh dear!!!)

    But, you still seem to be completely ignoring pack 'w*', @nums, which gives better compression than Elias Gamma, and is faster than anything else to boot.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Look plz in the above thread what i wrote...

      The only reason why i want to use compression in my index is for perfomance reasons..that was my thougths untill now but it seems that i wasnt right..:(..since with the vec function i can decode 3000000 doc ids in 2 seconds and 10 milion in 6 secs!!!)

      Lets forget the Elias code..and use only the vec , unpack V* and w* to benchmark..and finaly to found the winner...

        Ok. Here you go:

        #! perl -slw use strict; use Benchmark qw[ cmpthese ]; our $packedV = pack 'V*', 1 .. 1e6; our $packedW = pack 'w*', 1 .. 1e6; our $packedVec = ''; vec( $packedVec, $_, 32 ) = $_ for 0 .. 1e6 - 1; cmpthese -5, { unpackV => q[ my @nums = unpack 'V*', $packedV; ], unpackW => q[ my @nums = unpack 'w*', $packedW; ], unVec => q[ my @nums; push @nums, vec( $packedVec, $_, 32 ) for 0 .. 1e6 - + 1 ], }; print "$_: ", length( do{ no strict; ${$_} } ) for qw[ packedV packedW packedVec ]; __END__ C:\test>junk0 Rate unVec unpackV unpackW unVec 1.87/s -- -29% -31% unpackV 2.64/s 41% -- -2% unpackW 2.70/s 44% 2% -- packedV: 4000000 packedW: 2983490 packedVec: 4000000

        unpack 'w' is 44 % faster than vec and compresses the data to 75% to boot. Which means less time to transfer from the DB.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://680338]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (9)
As of 2014-12-28 04:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (178 votes), past polls