Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re^3: Efficient bit-twiddling in Perl.

by salva (Monsignor)
on Mar 01, 2013 at 09:43 UTC ( #1021200=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Efficient bit-twiddling in Perl.
in thread Efficient bit-twiddling in Perl.

Your benchmark is not very realistic. As $n doesn't vary, you are completely ignoring the effect of the CPU cache misses that the lookup table may introduce.

If you use a random $n, you will see that the table approach becomes actually quite slower than the simple one.

I have tried encoding the table in other ways, but have not been able to find any one good enough:

#! perl -slw use strict; use Time::HiRes qw[ time ]; my @lookup; $#lookup = 0x3ffff; $lookup[ $_ ] = [ ( $_ & 0x3f000 ) >> 12, ( $_ & 0xfc0 ) >> 6, $_ & 0x +3f ] for 0 .. 0x3ffff; my( @nxt, @mid, @bot ); $nxt[ $_ ] = ( $_ & 0x3f000 ) >> 12, $mid[ $_ ] = ( $_ & 0xfc0 ) >> 6, $bot[ $_ ] = $_ & 0x3f for 0 .. 0x3ffff; my (@lookup3); $#lookup3 = 0x3ffff; $lookup3[$_ << 6] = [$_ >> 6, $_ & 0x3f] for 0 .. 0xfff; my $lookup4 = 'x' x (3 * (1<<18)); $lookup4 = ''; $lookup4 .= pack CCC => $_ >> 12, ($_>>6) & 0x3f, $_ & 0x3f for 0..0x3 +ffff; my $lookup6 = 'x' x (2 * (1<<12)); $lookup6 = ''; $lookup6 .= pack CC => $_ >> 6, $_ & 0x3f for 0..0xfff; print "tables generated"; our $ITERS //= 10e6; my @n = map int(rand(1<<18)), 1..$ITERS; print "sample data generated"; sub stuff{ # print "@_"; } my $start = time; for my $n (@n) { stuff( ( $n ) >> 18, ( $n & 0x0003f000 ) >> 12, ( $n & 0x00000fc0 ) >> 6, ( $n & 0x0000003f ) ); } printf "Shift&and took: %.12f seconds\n", ( time() - $start ) / $ITERS +; $start = time; for my $n (@n) { stuff( $n >> 18, @{ $lookup[ $n & 0x3ffff ] } ); } printf " Lookup took: %.12f seconds\n", ( time() - $start ) / $ITERS +; $start = time; for my $n (@n) { stuff( $n >> 18, @{ $lookup3[$n & 0x3ffc0] }, $n & 0x3f ); } printf " Lookup3 took: %.12f seconds\n", ( time() - $start ) / $ITERS +; $start = time; for my $n (@n) { stuff( $n >> 18, unpack CCC => substr($lookup4, 3 * ($n & 0x3ffff) +, 3)); } printf " Lookup4 took: %.12f seconds\n", ( time() - $start ) / $ITERS +; $start = time; for my $n (@n) { stuff( $n >> 18, unpack 'x'.(3 * ($n & 0x3ffff)).'CCC' => $lookup4 +); } printf " Lookup5 took: %.12f seconds\n", ( time() - $start ) / $ITERS +; $start = time; for my $n (@n) { stuff( $n >> 18, unpack(CC => substr($lookup6, ($n & 0x3ffc0) >> 5 +, 3)), $n & 0x3f); } printf " Lookup6 took: %.12f seconds\n", ( time() - $start ) / $ITERS +; $start = time; for my $n (@n) { stuff( $n >> 18, unpack('x'.(($n & 0x3ffc0) >> 5).'CC', $lookup6), + $n & 0x3f); } printf " Lookup7 took: %.12f seconds\n", ( time() - $start ) / $ITERS +; __END__ Shift&and took: 0.000000783860 seconds Lookup took: 0.000001267049 seconds Lookup3 took: 0.000001018672 seconds Lookup4 took: 0.000001903985 seconds Lookup5 took: 0.000002110766 seconds Lookup6 took: 0.000001607903 seconds Lookup7 took: 0.000001791258 seconds


Comment on Re^3: Efficient bit-twiddling in Perl.
Download Code
Re^4: Efficient bit-twiddling in Perl.
by BrowserUk (Pope) on Mar 01, 2013 at 12:40 UTC
    Your benchmark is not very realistic. As $n doesn't vary, you are completely ignoring the effect of the CPU cache misses that the lookup table may introduce.

    The version of my benchmark I posted was the one I use -- in conjunction with -N=1 and uncommenting the print "@_"; -- to check that the output from all the methods is correct.

    For timing runs, I operate on the loop counter $_, which when -N=10e6, is enough to exercise the full range of the table 38 times.

    And I still get 20%+ improvement from the AoAs lookup:

    Results

    C:\test>1021064 Shift&and took: 0.000000606833 seconds Lookup took: 0.000000485629 seconds Lookup2 took: 0.000000669586 seconds (un)pack took: 0.000007564075 seconds

    In general, I've found it very difficult to detect, much less measure any discernible effect from cpu caching in perl code on my machine. It shows up readily in tight loops in C code, but Perl's memory allocation is so cache unfriendly that it rarely seems to come into play.

    I would be interested to see the output from the above on your machine if you have a coupe of minutes.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      $ perl /tmp/1021241.pl Shift&and took: 0.000000844352 seconds Lookup took: 0.000000794679 seconds Lookup2 took: 0.000000945985 seconds (un)pack took: 0.000007257208 seconds

        Less, but still a saving. Thanks. What cpu are you using?

      tables generated sample data generated Shift&and took: 0.000000283640 seconds Lookup took: 0.000000553529 seconds Lookup3 took: 0.000000308301 seconds Lookup4 took: 0.000000435056 seconds Lookup5 took: 0.000000487484 seconds Lookup6 took: 0.000000454496 seconds Lookup7 took: 0.000000476669 seconds Linux 3.4.28-2.20-desktop [openSUSE 12.2 (Mantis)] x86_64 Xeon(R) CPU E5-1650 0 @ 3.20GHz/1200(12) x86_64 16009 Mb This is perl 5, version 16, subversion 2 (v5.16.2) built for x86_64-li +nux-ld

      I think that proves my point in Re^3: Efficient bit-twiddling in Perl.


      Enjoy, Have FUN! H.Merijn
        I think that proves my point in Re^3: Efficient bit-twiddling in Perl.
        I am stunned by the 20% gain you get, and wonder if that would differ per perl version and/or architecture.

        It absolutely will, but as I said in the OP, this is for me, on my hardware. Not an exemplar for anyone else.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
      Shift&and took: 0.000000288148 seconds Lookup took: 0.000000252554 seconds Lookup2 took: 0.000000339390 seconds (un)pack took: 0.000002284377 seconds

      Enjoy, Have FUN! H.Merijn

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1021200]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (4)
As of 2014-07-10 05:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (199 votes), past polls