Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Efficient bit-twiddling in Perl.

by BrowserUk (Pope)
on Feb 28, 2013 at 14:47 UTC ( #1021064=perlquestion: print w/ replies, xml ) Need Help??
BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

I need to break unsigned 32-bit values into 4 parts: (from the msb) 14-bits, 6-bits, 6-bits, 6-bits; (using perl rather than I::C or XS). That can be done as follows:

#! perl -slw use strict; my $n = 0x80061861; my $top14 = ( $n & 0xfffc0000 ) >> 18; my $nxt6 = ( $n & 0x0003f000 ) >> 12; my $mid6 = ( $n & 0x00000fc0 ) >> 6; my $bot6 = ( $n & 0x0000003f ); print "$n => $top14 : $nxt6 : $mid6 : $bot6"; __END__ C:\test>bit 2147883105 => 8193 : 33 : 33 : 33

But I need to do this millions of times, so I'm looking to see if there is a more efficient way? (I don't care if I get 4 separate variables or an array with 4 values.)

Note: This is for me; no one else. Elegance, readability and maintainability are simply not considerations here. Just speed.

Any suggestions?


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Comment on Efficient bit-twiddling in Perl.
Download Code
Re: Efficient bit-twiddling in Perl.
by salva (Monsignor) on Feb 28, 2013 at 15:17 UTC
    Not a big improvement, but you can eliminate the AND operation when calculating $top14.

    If you have to do that millions of times, you can try building a table for the combined three 6bit quantities:

    my @table; $table[$_] = [..., ..., ...] for 0..0x0003ffff; ... for my $n (...) { my $top = $n >> 18; my ($nxt, $mid, $bot) = @{$table[$n & 0x0003ffff]}; ... }
      you can try building a table for the combined three 6bit quantities:

      The lookup table gains about 20%. Less than I expected, but worth having. Thank you.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Efficient bit-twiddling in Perl.
by dave_the_m (Parson) on Feb 28, 2013 at 15:51 UTC
    Note that the code you've got is already fairly efficient. Consider the following:
    my $n = 0x80061861; for (1..10_000_000) { my $top14 = ( $n & 0xfffc0000 ) >> 18; my $nxt6 = ( $n & 0x0003f000 ) >> 12; my $mid6 = ( $n & 0x00000fc0 ) >> 6; my $bot6 = ( $n & 0x0000003f ); # or replace the four lines above with: #my ($top14, $nxt6, $mid6, $bot6) = (1,1,1,1); }
    Running as-is on my system takes 3.1s; using just the last line (which represents a baseline of creating, setting and freeing those four lexical vars), takes 1.9s. So any savings you make are likely to be less than that.

    Dave.

      Se Re^2: Efficient bit-twiddling in Perl. for my starting position.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Efficient bit-twiddling in Perl.
by Tux (Monsignor) on Feb 28, 2013 at 16:58 UTC

    Just to test if it would bring speed (which I did not expect), I fiddled with pack and unpack. Before doing the hard part, I looked at the speeds halfway. It is not worth pursuing.

    I also looked at Inline::C, and it is slower than your perl code (and I already optimized the stack stuff)

    $ cat test.pl use 5.016; use warnings; use Inline "C"; use Benchmark qw( cmpthese ); my $n = 0x80061861; my $x = 10000; sub andshift { my ($top14, $nxt6, $mid6, $bot6); for (1 .. $x) { $top14 = ($n & 0xfffc0000) >> 18; $nxt6 = ($n & 0x0003f000) >> 12; $mid6 = ($n & 0x00000fc0) >> 6; $bot6 = ($n & 0x0000003f); } return ($top14, $nxt6, $mid6, $bot6); } # andshift sub packunpack { my ($top14, $nxt6, $mid6, $bot6); for (1 .. $x) { ($top14, $nxt6, $mid6, $bot6) = unpack "A14 A6 A6 A6", unpack "B32", => pack "N" => $n; } return ($top14, $nxt6, $mid6, $bot6); } # packunpack sub inlined { my ($top14, $nxt6, $mid6, $bot6); for (1 .. $x) { ($top14, $nxt6, $mid6, $bot6) = split14666 ($n); } return ($top14, $nxt6, $mid6, $bot6); } # inlined say for andshift (); say for packunpack (); say for split14666 ($n); cmpthese (-1, { andshift => \&andshift, bitstrings => \&packunpack, inline_c => \&inlined, }); __END__ __C__ void split14666 (int n) { Inline_Stack_Vars; Inline_Stack_Reset; EXTEND (sp, 4); mPUSHi (((unsigned int)n & 0xfffc0000) >> 18); mPUSHi (((unsigned int)n & 0x0003f000) >> 12); mPUSHi (((unsigned int)n & 0x00000fc0) >> 6); mPUSHi (((unsigned int)n & 0x0000003f) ); Inline_Stack_Done; } /* split14666 */ $ perl test.pl 8193 33 33 33 10000000000001 100001 100001 100001 8193 33 33 33 Rate bitstrings inline_c andshift bitstrings 121/s -- -52% -74% inline_c 250/s 108% -- -47% andshift 470/s 290% 88% --

    Enjoy, Have FUN! H.Merijn
      Before doing the hard part, I looked at the speeds halfway. It is not worth pursuing.

      Indeed. My initial thought for this was to use unpack, and I came up with this unholy construct:

      unpack 'vccc', join'',map{ pack'b*', $_ } unpack 'a14a6a6a6', unpack ' +B32', pack 'N', $n;

      Which does the job but (not unexpectedly) is an order of magnitude slower than my second attempt in the OP.

      That was a pretty good saving, but whenever I have asked these type of questions in the past, someone has always come up with a better solution.

      And indeed, salva has; albeit only a further 20% saving. I also tried using separate lookup tables to save a dereference to no avail:

      #! perl -slw use strict; use Time::HiRes qw[ time ]; my @lookup; $#lookup = 0x3ffff; $lookup[ $_ ] = [ ( $_ & 0x3f000 ) >> 12, ( $_ & 0xfc0 ) >> 6, $_ & 0x +3f ] for 0 .. 0x3ffff; my( @nxt, @mid, @bot ); $nxt[ $_ ] = ( $_ & 0x3f000 ) >> 12, $mid[ $_ ] = ( $_ & 0xfc0 ) >> 6, $bot[ $_ ] = $_ & 0x3f for 0 .. 0x3ffff; sub stuff{ # print "@_"; } our $ITERS //= 10e6; my $n = 0x80061861; my $start = time; for ( 1 .. $ITERS ) { stuff( ( $n & 0xffc00000 ) >> 18, ( $n & 0x0003f000 ) >> 12, ( $n & 0x00000fc0 ) >> 6, ( $n & 0x0000003f ) ); } printf "Shift&and took: %.12f seconds\n", ( time() - $start ) / $ITERS +; $start = time; for ( 1 .. $ITERS ) { stuff( $n >> 18, @{ $lookup[ $n & 0x3ffff ] } ); } printf " Lookup took: %.12f seconds\n", ( time() - $start ) / $ITERS +; $start = time; for ( 1 .. $ITERS ) { my $i = $n & 0x3ffff; stuff( $n >> 18, $nxt[ $i ], $mid[ $i ], $bot[ $i ] ); } printf " Lookup2 took: %.12f seconds\n", ( time() - $start ) / $ITERS +; $start = time; for ( 1 .. $ITERS ) { stuff( unpack 'vccc', join'',map{ pack'b*', $_ } unpack 'a14a6a6a6 +', unpack 'B32', pack 'N', $n ); } printf "(un)pack took: %.12f seconds\n", ( time() - $start ) / $ITERS +; __END__ C:\test>1021064 Shift&and took: 0.000000482421 seconds Lookup took: 0.000000386419 seconds Lookup2 took: 0.000000547556 seconds (un)pack took: 0.000005933478 seconds

      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        I'll qualify this result to say salva's result doesn't necessarily buy you anything if you are only doing this millions of times. On my machine, I get the timings
        Generation took: 0.218580069542 seconds Shift&and took: 0.000000471000 seconds Lookup took: 0.000000395095 seconds
        Where generation is calculated with
        my $start = time; for (1..100) { my @lookup; $#lookup = 0x3ffff; $lookup[ $_ ] = [ ( $_ & 0x3f000 ) >> 12, ( $_ & 0xfc0 ) >> 6, $_ & 0x +3f ] for 0 .. 0x3ffff; } printf "Generation took: %.12f seconds\n", ( time() - $start )/100;
        I ran the transforms for 10^7 iterations, though frankly all metrics were still a little volatile for my taste. This puts breakeven at 2.9 million iterations and 5% speed up at 4.2 million. YMMV.

        #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

        Bit::Vector as you might guess isn't any better...

        use strict; use warnings; use Time::HiRes qw[ time ]; use Bit::Vector; my @lookup; $#lookup = 0x3ffff; $lookup[ $_ ] = [ ( $_ & 0x3f000 ) >> 12, ( $_ & 0xfc0 ) >> 6, $_ & 0x +3f ] for 0 .. 0x3ffff; my( @nxt, @mid, @bot ); $nxt[ $_ ] = ( $_ & 0x3f000 ) >> 12, $mid[ $_ ] = ( $_ & 0xfc0 ) >> 6, $bot[ $_ ] = $_ & 0x3f for 0 .. 0x3ffff; sub stuff{ # print "@_"; } our $ITERS //= 10e6; my $n = 0x80061861; my $start = time; for ( 1 .. $ITERS ) { stuff( ( $n & 0xffc00000 ) >> 18, ( $n & 0x0003f000 ) >> 12, ( $n & 0x00000fc0 ) >> 6, ( $n & 0x0000003f ) ); } printf " Shift&and took: %.12f seconds\n", ( time() - $start ) / $ITE +RS; $start = time; for ( 1 .. $ITERS ) { stuff( $n >> 18, @{ $lookup[ $n & 0x3ffff ] } ); } printf " Lookup took: %.12f seconds\n", ( time() - $start ) / $ITE +RS; $start = time; for ( 1 .. $ITERS ) { my $i = $n & 0x3ffff; stuff( $n >> 18, $nxt[ $i ], $mid[ $i ], $bot[ $i ] ); } printf " Lookup2 took: %.12f seconds\n", ( time() - $start ) / $ITE +RS; $start = time; for ( 1 .. $ITERS ) { stuff( unpack 'vccc', join'',map{ pack'b*', $_ } unpack 'a14a6a6a6 +', unpack 'B32', pack 'N', $n ); } printf " (un)pack took: %.12f seconds\n", ( time() - $start ) / $ITE +RS; $n="$n"; my $vector; $start = time; for (1 .. $ITERS ) { $vector = Bit::Vector->new_Hex(32, $n); stuff($vector->Chunk_Read(14, 18), $vector->Chunk_Read( 6, 12), $vector->Chunk_Read( 6, 6), $vector->Chunk_Read( 6, 0) ); } printf "Bit::Vector took: %.12f seconds\n", ( time() - $start ) / $ITE +RS; __END__ C:\test>perl bv.pl Shift&and took: 0.000000551174 seconds Lookup took: 0.000000272709 seconds Lookup2 took: 0.000000335823 seconds (un)pack took: 0.000003393702 seconds Bit::Vector took: 0.000005372898 seconds

        I think you can shave off a lot in XS (or Inline::C) when you switch to prebound variables, ala bind_columns () in DBI or Text::CSV_XS, as much of the XS overhead is stack-work. That said, it then is still is function calls, so the stack will be involved. If you pre-bind both in- AND out- parameters, you don't need to push variables on the stack and pop return values of the stack. That should speed up quite a bit.

        The next step could be looking in to op-hooks, so there are no real perl-level function calls anymore, but looking at what you already have, I'm wondering if it would be worth the effort (other than to learn from it),

        The lookup solutions have array-index ops in the back. I am stunned by the 20% gain you get, and wonder if that would differ per perl version and/or architecture.

        Anyway, the solution that is fastest on the machine the problem is solved on is most important, even if another method would be four times faster on z/OS with perl4 :)


        Enjoy, Have FUN! H.Merijn
        Your benchmark is not very realistic. As $n doesn't vary, you are completely ignoring the effect of the CPU cache misses that the lookup table may introduce.

        If you use a random $n, you will see that the table approach becomes actually quite slower than the simple one.

        I have tried encoding the table in other ways, but have not been able to find any one good enough:

        #! perl -slw use strict; use Time::HiRes qw[ time ]; my @lookup; $#lookup = 0x3ffff; $lookup[ $_ ] = [ ( $_ & 0x3f000 ) >> 12, ( $_ & 0xfc0 ) >> 6, $_ & 0x +3f ] for 0 .. 0x3ffff; my( @nxt, @mid, @bot ); $nxt[ $_ ] = ( $_ & 0x3f000 ) >> 12, $mid[ $_ ] = ( $_ & 0xfc0 ) >> 6, $bot[ $_ ] = $_ & 0x3f for 0 .. 0x3ffff; my (@lookup3); $#lookup3 = 0x3ffff; $lookup3[$_ << 6] = [$_ >> 6, $_ & 0x3f] for 0 .. 0xfff; my $lookup4 = 'x' x (3 * (1<<18)); $lookup4 = ''; $lookup4 .= pack CCC => $_ >> 12, ($_>>6) & 0x3f, $_ & 0x3f for 0..0x3 +ffff; my $lookup6 = 'x' x (2 * (1<<12)); $lookup6 = ''; $lookup6 .= pack CC => $_ >> 6, $_ & 0x3f for 0..0xfff; print "tables generated"; our $ITERS //= 10e6; my @n = map int(rand(1<<18)), 1..$ITERS; print "sample data generated"; sub stuff{ # print "@_"; } my $start = time; for my $n (@n) { stuff( ( $n ) >> 18, ( $n & 0x0003f000 ) >> 12, ( $n & 0x00000fc0 ) >> 6, ( $n & 0x0000003f ) ); } printf "Shift&and took: %.12f seconds\n", ( time() - $start ) / $ITERS +; $start = time; for my $n (@n) { stuff( $n >> 18, @{ $lookup[ $n & 0x3ffff ] } ); } printf " Lookup took: %.12f seconds\n", ( time() - $start ) / $ITERS +; $start = time; for my $n (@n) { stuff( $n >> 18, @{ $lookup3[$n & 0x3ffc0] }, $n & 0x3f ); } printf " Lookup3 took: %.12f seconds\n", ( time() - $start ) / $ITERS +; $start = time; for my $n (@n) { stuff( $n >> 18, unpack CCC => substr($lookup4, 3 * ($n & 0x3ffff) +, 3)); } printf " Lookup4 took: %.12f seconds\n", ( time() - $start ) / $ITERS +; $start = time; for my $n (@n) { stuff( $n >> 18, unpack 'x'.(3 * ($n & 0x3ffff)).'CCC' => $lookup4 +); } printf " Lookup5 took: %.12f seconds\n", ( time() - $start ) / $ITERS +; $start = time; for my $n (@n) { stuff( $n >> 18, unpack(CC => substr($lookup6, ($n & 0x3ffc0) >> 5 +, 3)), $n & 0x3f); } printf " Lookup6 took: %.12f seconds\n", ( time() - $start ) / $ITERS +; $start = time; for my $n (@n) { stuff( $n >> 18, unpack('x'.(($n & 0x3ffc0) >> 5).'CC', $lookup6), + $n & 0x3f); } printf " Lookup7 took: %.12f seconds\n", ( time() - $start ) / $ITERS +; __END__ Shift&and took: 0.000000783860 seconds Lookup took: 0.000001267049 seconds Lookup3 took: 0.000001018672 seconds Lookup4 took: 0.000001903985 seconds Lookup5 took: 0.000002110766 seconds Lookup6 took: 0.000001607903 seconds Lookup7 took: 0.000001791258 seconds
Re: Efficient bit-twiddling in Perl.
by kcott (Abbot) on Mar 01, 2013 at 08:56 UTC

    G'day BrowserUk,

    I tried this a few different ways. Shifting first then just ANDing the six bits seems to be faster than ANDing then shifting. About half of the time seemed to be taken with the assignments, so if you can avoid some or all of those you get additional improvements. Here's the various tests:

    #!/usr/bin/env perl use 5.010; use strict; use warnings; use Benchmark qw{cmpthese :hireswallclock}; my $n = 0x80061861; say "and_rshift: $n => ", join(' : ', and_rshift()); say "lshift_rshift: $n => ", join(' : ', lshift_rshift()); say "mixed: $n => ", join(' : ', mixed()); say "mixed_assign: $n => ", join(' : ', mixed_assign()); say "just_shift: $n => ", join(' : ', just_shift()); cmpthese(-10, { and_rshift => \&and_rshift, lshift_rshift => \&lshift_rshift, mixed => \&mixed, mixed_assign => \&mixed_assign, just_shift => \&just_shift, }); sub and_rshift { my $top14 = ( $n & 0xfffc0000 ) >> 18; my $nxt6 = ( $n & 0x0003f000 ) >> 12; my $mid6 = ( $n & 0x00000fc0 ) >> 6; my $bot6 = ( $n & 0x0000003f ); return ($top14, $nxt6, $mid6, $bot6); } sub lshift_rshift { my $top14 = $n >> 18; my $nxt6 = $n << 46 >> 58; my $mid6 = $n << 52 >> 58; my $bot6 = $n << 58 >> 58; return ($top14, $nxt6, $mid6, $bot6); } sub mixed { return ($n >> 18, $n >> 12 & 0x3f, $n >> 6 & 0x3f, $n & 0x3f); } sub mixed_assign { my $top14 = $n >> 18; my $nxt6 = $n >> 12 & 0x3f; my $mid6 = $n >> 6 & 0x3f; my $bot6 = $n & 0x3f; return ($top14, $nxt6, $mid6, $bot6); } sub just_shift { return ($n >> 18, $n << 46 >> 58, $n << 52 >> 58, $n << 58 >> 58); }

    Output:

    $ pm_bit_twiddle.pl and_rshift: 2147883105 => 8193 : 33 : 33 : 33 lshift_rshift: 2147883105 => 8193 : 33 : 33 : 33 mixed: 2147883105 => 8193 : 33 : 33 : 33 mixed_assign: 2147883105 => 8193 : 33 : 33 : 33 just_shift: 2147883105 => 8193 : 33 : 33 : 33 Rate and_rshift lshift_rshift mixed_assign just_shi +ft mixed and_rshift 1942328/s -- -7% -11% -5 +2% -53% lshift_rshift 2088397/s 8% -- -4% -4 +8% -50% mixed_assign 2182671/s 12% 5% -- -4 +6% -48% just_shift 4037158/s 108% 93% 85% +-- -3% mixed 4163497/s 114% 99% 91% +3% --

    -- Ken

      I fed your JustShift solution (a cute bit of lateral thinking :), into my benchmark and on my hardware it is a tad slower than shift&and:

      C:\test>1021064 Shift&and took: 0.000000611130 seconds justshift took: 0.000000651048 seconds Lookup took: 0.000000488745 seconds Lookup2 took: 0.000000706896 seconds (un)pack took: 0.000007464467 seconds

      I was a bit surprised by that as both shift & and are designated as single clock operations on my cpu. I guess the difference is in the perl operations that overlay them.

      I follow your drift regarding the effect that assignment and stack handling can have on a benchmark of this nature; but the values are re-used several times through the rest of the body of the inner loop, so the assignment is a constant factor that I've chosen to exclude.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      When you reduce the overhead of the function calls (by doing the work you actually time in a tight loop), you might get other numbers

      and_rshift: 2147883105 => 8193 : 33 : 33 : 33 lshift_rshift: 2147883105 => 8193 : 33 : 33 : 33 mixed: 2147883105 => 8193 : 33 : 33 : 33 mixed_assign: 2147883105 => 8193 : 33 : 33 : 33 just_shift: 2147883105 => 8193 : 33 : 33 : 33 Rate just_shift mixed mixed_assign and_rshift lsh +ift_rshift just_shift 247/s -- -2% -43% -48% + -52% mixed 252/s 2% -- -42% -47% + -52% mixed_assign 434/s 75% 72% -- -9% + -16% and_rshift 480/s 94% 90% 10% -- + -8% lshift_rshift 520/s 110% 106% 20% 8% + --

      Enjoy, Have FUN! H.Merijn
Re: Efficient bit-twiddling in Perl.
by flexvault (Parson) on Mar 01, 2013 at 11:44 UTC

    BrowserUk,

    This is more about the source of your input (32bit) stream. If it just (0..2**32) and then repeat, then you look like you have the solution now.

    But if the input stream is being generated from an analog or digital source and the encoding could have repetition, then using 'memorize' or caching the results in a hash might give you some additional performance.

    I once saw an IBM demo that compressed 1 minute of full color video of a football game unto a 360K floppy. IBM's explanation was that so much of the data was repetitious that only the differences from frame to frame was being saved on the floppy.

    Just a thought...Ed

    "Well done is better than well said." - Benjamin Franklin

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1021064]
Approved by marto
Front-paged by Lotus1
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (12)
As of 2014-07-24 09:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (158 votes), past polls