Your skill will accomplishwhat the force of many cannot PerlMonks

### Re: Efficient bit-twiddling in Perl.

by Tux (Abbot)
 on Feb 28, 2013 at 16:58 UTC ( #1021079=note: print w/replies, xml ) Need Help??

in reply to Efficient bit-twiddling in Perl.

Just to test if it would bring speed (which I did not expect), I fiddled with pack and unpack. Before doing the hard part, I looked at the speeds halfway. It is not worth pursuing.

I also looked at Inline::C, and it is slower than your perl code (and I already optimized the stack stuff)

```\$ cat test.pl
use 5.016;
use warnings;

use Inline "C";
use Benchmark qw( cmpthese );

my \$n = 0x80061861;
my \$x = 10000;

sub andshift
{
my (\$top14, \$nxt6, \$mid6, \$bot6);
for (1 .. \$x) {
\$top14 = (\$n & 0xfffc0000) >> 18;
\$nxt6  = (\$n & 0x0003f000) >> 12;
\$mid6  = (\$n & 0x00000fc0) >>  6;
\$bot6  = (\$n & 0x0000003f);
}
return (\$top14, \$nxt6, \$mid6, \$bot6);
} # andshift

sub packunpack
{
my (\$top14, \$nxt6, \$mid6, \$bot6);
for (1 .. \$x) {
(\$top14, \$nxt6, \$mid6, \$bot6) =
unpack "A14 A6 A6 A6", unpack "B32", => pack "N" => \$n;
}
return (\$top14, \$nxt6, \$mid6, \$bot6);
} # packunpack

sub inlined
{
my (\$top14, \$nxt6, \$mid6, \$bot6);
for (1 .. \$x) {
(\$top14, \$nxt6, \$mid6, \$bot6) = split14666 (\$n);
}
return (\$top14, \$nxt6, \$mid6, \$bot6);
} # inlined

say for andshift ();
say for packunpack ();
say for split14666 (\$n);

cmpthese (-1, {
andshift   => \&andshift,
bitstrings => \&packunpack,
inline_c   => \&inlined,
});
__END__
__C__
void split14666 (int n)
{
Inline_Stack_Vars;
Inline_Stack_Reset;
EXTEND (sp, 4);
mPUSHi (((unsigned int)n & 0xfffc0000) >> 18);
mPUSHi (((unsigned int)n & 0x0003f000) >> 12);
mPUSHi (((unsigned int)n & 0x00000fc0) >>  6);
mPUSHi (((unsigned int)n & 0x0000003f)      );
Inline_Stack_Done;
} /* split14666 */
\$ perl test.pl
8193
33
33
33
10000000000001
100001
100001
100001
8193
33
33
33
Rate bitstrings   inline_c   andshift
bitstrings 121/s         --       -52%       -74%
inline_c   250/s       108%         --       -47%
andshift   470/s       290%        88%         --

Enjoy, Have FUN! H.Merijn

Replies are listed 'Best First'.
Re^2: Efficient bit-twiddling in Perl.
by BrowserUk (Pope) on Feb 28, 2013 at 17:28 UTC
Before doing the hard part, I looked at the speeds halfway. It is not worth pursuing.

Indeed. My initial thought for this was to use unpack, and I came up with this unholy construct:

```unpack 'vccc', join'',map{ pack'b*', \$_ } unpack 'a14a6a6a6', unpack '
+B32', pack 'N', \$n;

Which does the job but (not unexpectedly) is an order of magnitude slower than my second attempt in the OP.

That was a pretty good saving, but whenever I have asked these type of questions in the past, someone has always come up with a better solution.

And indeed, salva has; albeit only a further 20% saving. I also tried using separate lookup tables to save a dereference to no avail:

```#! perl -slw
use strict;
use Time::HiRes qw[ time ];

my @lookup; \$#lookup = 0x3ffff;
\$lookup[ \$_ ] = [ ( \$_ & 0x3f000 ) >> 12, ( \$_ & 0xfc0 ) >> 6, \$_ & 0x
+3f ]
for 0 .. 0x3ffff;

my( @nxt, @mid, @bot );
\$nxt[ \$_ ] = ( \$_ & 0x3f000 ) >> 12,
\$mid[ \$_ ] = ( \$_ & 0xfc0 ) >> 6,
\$bot[ \$_ ] = \$_ & 0x3f
for 0 .. 0x3ffff;

sub stuff{
#    print "@_";
}

our \$ITERS //= 10e6;

my \$n = 0x80061861;

my \$start = time;
for ( 1 .. \$ITERS ) {
stuff(
( \$n & 0xffc00000 ) >> 18,
( \$n & 0x0003f000 ) >> 12,
( \$n & 0x00000fc0 ) >>  6,
( \$n & 0x0000003f )
);
}
printf "Shift&and took: %.12f seconds\n", ( time() - \$start ) / \$ITERS
+;

\$start = time;
for ( 1 .. \$ITERS ) {
stuff( \$n >> 18, @{ \$lookup[ \$n & 0x3ffff ] } );
}
printf "  Lookup  took: %.12f seconds\n", ( time() - \$start ) / \$ITERS
+;

\$start = time;
for ( 1 .. \$ITERS ) {
my \$i = \$n & 0x3ffff;
stuff( \$n >> 18, \$nxt[ \$i ], \$mid[ \$i ], \$bot[ \$i ] );
}
printf "  Lookup2 took: %.12f seconds\n", ( time() - \$start ) / \$ITERS
+;

\$start = time;
for ( 1 .. \$ITERS ) {
stuff( unpack 'vccc', join'',map{ pack'b*', \$_ } unpack 'a14a6a6a6
+', unpack 'B32', pack 'N', \$n );
}
printf "(un)pack  took: %.12f seconds\n", ( time() - \$start ) / \$ITERS
+;

__END__
C:\test>1021064
Shift&and took: 0.000000482421 seconds
Lookup  took: 0.000000386419 seconds
Lookup2 took: 0.000000547556 seconds
(un)pack  took: 0.000005933478 seconds

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
I'll qualify this result to say salva's result doesn't necessarily buy you anything if you are only doing this millions of times. On my machine, I get the timings
```Generation took: 0.218580069542 seconds
Shift&and took: 0.000000471000 seconds
Lookup  took: 0.000000395095 seconds
Where generation is calculated with
```my \$start = time;
for (1..100) {
my @lookup; \$#lookup = 0x3ffff;
\$lookup[ \$_ ] = [ ( \$_ & 0x3f000 ) >> 12, ( \$_ & 0xfc0 ) >> 6, \$_ & 0x
+3f ]
for 0 .. 0x3ffff;
}
printf "Generation took: %.12f seconds\n", ( time() - \$start )/100;
I ran the transforms for 10^7 iterations, though frankly all metrics were still a little volatile for my taste. This puts breakeven at 2.9 million iterations and 5% speed up at 4.2 million. YMMV.

#11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

The lookup table generation is 2/10th second, 1-off cost at startup.

Completely insignificant to the cost of the inner loop lookup code that could theoretically be executed 4 billion times. (Once for each 32-bit number.) More typical is mid 100s to low 1000s of millions of times.

A small saving overall, but only one part of the inner loop code I am optimising. Already down from a couple of weeks to ~4 days.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Bit::Vector as you might guess isn't any better...

```use strict;
use warnings;
use Time::HiRes qw[ time ];
use Bit::Vector;

my @lookup; \$#lookup = 0x3ffff;
\$lookup[ \$_ ] = [ ( \$_ & 0x3f000 ) >> 12, ( \$_ & 0xfc0 ) >> 6, \$_ & 0x
+3f ]
for 0 .. 0x3ffff;

my( @nxt, @mid, @bot );
\$nxt[ \$_ ] = ( \$_ & 0x3f000 ) >> 12,
\$mid[ \$_ ] = ( \$_ & 0xfc0 ) >> 6,
\$bot[ \$_ ] = \$_ & 0x3f
for 0 .. 0x3ffff;

sub stuff{
#    print "@_";
}

our \$ITERS //= 10e6;

my \$n = 0x80061861;

my \$start = time;
for ( 1 .. \$ITERS ) {
stuff(
( \$n & 0xffc00000 ) >> 18,
( \$n & 0x0003f000 ) >> 12,
( \$n & 0x00000fc0 ) >>  6,
( \$n & 0x0000003f )
);
}
printf "  Shift&and took: %.12f seconds\n", ( time() - \$start ) / \$ITE
+RS;

\$start = time;
for ( 1 .. \$ITERS ) {
stuff( \$n >> 18, @{ \$lookup[ \$n & 0x3ffff ] } );
}
printf "    Lookup  took: %.12f seconds\n", ( time() - \$start ) / \$ITE
+RS;

\$start = time;
for ( 1 .. \$ITERS ) {
my \$i = \$n & 0x3ffff;
stuff( \$n >> 18, \$nxt[ \$i ], \$mid[ \$i ], \$bot[ \$i ] );
}
printf "    Lookup2 took: %.12f seconds\n", ( time() - \$start ) / \$ITE
+RS;

\$start = time;
for ( 1 .. \$ITERS ) {
stuff( unpack 'vccc', join'',map{ pack'b*', \$_ } unpack 'a14a6a6a6
+', unpack 'B32', pack 'N', \$n );
}
printf "   (un)pack took: %.12f seconds\n", ( time() - \$start ) / \$ITE
+RS;

\$n="\$n";
my \$vector;
\$start = time;
for (1 .. \$ITERS ) {
\$vector = Bit::Vector->new_Hex(32, \$n);
);
}
printf "Bit::Vector took: %.12f seconds\n", ( time() - \$start ) / \$ITE
+RS;
__END__

C:\test>perl bv.pl
Shift&and took: 0.000000551174 seconds
Lookup  took: 0.000000272709 seconds
Lookup2 took: 0.000000335823 seconds
(un)pack took: 0.000003393702 seconds
Bit::Vector took: 0.000005372898 seconds
The Bit::Vector code needs new_Dec, not new_Hex. Here are are some additions to your code showing Data::BitStream::XS and two alternatives using Bit::Vector. Neither are improvements over the fast simple methods, but they do show that both are at least reasonable solutions.
```\$start = time;
my \$stream = Data::BitStream::XS->new;
# \$stream->write(32, \$n) for 1 .. \$ITERS; # Below is faster
\$stream->put_raw(pack("N",\$n) x \$ITERS, 32*\$ITERS);
for (1 .. \$ITERS ) {
}
printf "   D::B::XS took: %.12f seconds\n", ( time() - \$start ) / \$ITE
+RS;

\$start = time;
for (1 .. \$ITERS ) {
my \$vector = Bit::Vector->new_Dec(32, \$n);
);
}
printf "Bit::Vector took: %.12f seconds\n", ( time() - \$start ) / \$ITE
+RS;

\$start = time;
my \$vector = Bit::Vector->new(32*\$ITERS);
\$vector->Word_List_Store((\$n) x \$ITERS);
for (0 .. \$ITERS-1 ) {
);
}
printf "Bit::Vector took: %.12f seconds\n", ( time() - \$start ) / \$ITE
+RS;
I get on my computer:
```  Shift&and took: 0.000000193217 seconds
Lookup  took: 0.000000180829 seconds
Lookup2 took: 0.000000215928 seconds
(un)pack took: 0.000001851533 seconds
D::B::XS took: 0.000000643722 seconds
Bit::Vector took: 0.000001739788 seconds
Bit::Vector took: 0.000000957755 seconds
It really depends on what the usage model is. Both Data::BitStream and Bit::Vector are going to suffer if you insist on making a new object for each word. They will do much better if you let them create a single object and fill it, then pull the data out in arbitrary sized chunks. If the OP had all the values in one chunk or array then this would be more appropriate.

One could also write a new coding role for Data::BitStream to handle retrieving the values using code similar to one of the lookup methods, but there would have to be some real reason to want a stream vs. just doing it using one of the lookup methods.

Your benchmark is not very realistic. As \$n doesn't vary, you are completely ignoring the effect of the CPU cache misses that the lookup table may introduce.

If you use a random \$n, you will see that the table approach becomes actually quite slower than the simple one.

I have tried encoding the table in other ways, but have not been able to find any one good enough:

```#! perl -slw
use strict;
use Time::HiRes qw[ time ];

my @lookup; \$#lookup = 0x3ffff;
\$lookup[ \$_ ] = [ ( \$_ & 0x3f000 ) >> 12, ( \$_ & 0xfc0 ) >> 6, \$_ & 0x
+3f ]
for 0 .. 0x3ffff;

my( @nxt, @mid, @bot );
\$nxt[ \$_ ] = ( \$_ & 0x3f000 ) >> 12,
\$mid[ \$_ ] = ( \$_ & 0xfc0 ) >> 6,
\$bot[ \$_ ] = \$_ & 0x3f
for 0 .. 0x3ffff;

my (@lookup3);
\$#lookup3 = 0x3ffff;
\$lookup3[\$_ << 6] = [\$_ >> 6, \$_ & 0x3f] for 0 .. 0xfff;

my \$lookup4 = 'x' x (3 * (1<<18));
\$lookup4 = '';
\$lookup4 .= pack CCC => \$_ >> 12, (\$_>>6) & 0x3f, \$_ & 0x3f for 0..0x3
+ffff;

my \$lookup6 = 'x' x (2 * (1<<12));
\$lookup6 = '';
\$lookup6 .= pack CC => \$_ >> 6, \$_ & 0x3f for 0..0xfff;

print "tables generated";

our \$ITERS //= 10e6;
my @n = map int(rand(1<<18)), 1..\$ITERS;

print "sample data generated";

sub stuff{
#    print "@_";
}

my \$start = time;
for my \$n (@n) {
stuff(
( \$n              ) >> 18,
( \$n & 0x0003f000 ) >> 12,
( \$n & 0x00000fc0 ) >>  6,
( \$n & 0x0000003f )
);
}
printf "Shift&and took: %.12f seconds\n", ( time() - \$start ) / \$ITERS
+;

\$start = time;
for my \$n (@n) {
stuff( \$n >> 18, @{ \$lookup[ \$n & 0x3ffff ] } );
}
printf "  Lookup  took: %.12f seconds\n", ( time() - \$start ) / \$ITERS
+;

\$start = time;
for my \$n (@n) {
stuff( \$n >> 18, @{ \$lookup3[\$n & 0x3ffc0] }, \$n & 0x3f );
}
printf "  Lookup3 took: %.12f seconds\n", ( time() - \$start ) / \$ITERS
+;

\$start = time;
for my \$n (@n) {
stuff( \$n >> 18, unpack CCC => substr(\$lookup4, 3 * (\$n & 0x3ffff)
+, 3));
}
printf "  Lookup4 took: %.12f seconds\n", ( time() - \$start ) / \$ITERS
+;

\$start = time;
for my \$n (@n) {
stuff( \$n >> 18, unpack 'x'.(3 * (\$n & 0x3ffff)).'CCC' => \$lookup4
+);
}
printf "  Lookup5 took: %.12f seconds\n", ( time() - \$start ) / \$ITERS
+;

\$start = time;
for my \$n (@n) {
stuff( \$n >> 18, unpack(CC => substr(\$lookup6, (\$n & 0x3ffc0) >> 5
+, 3)), \$n & 0x3f);
}
printf "  Lookup6 took: %.12f seconds\n", ( time() - \$start ) / \$ITERS
+;

\$start = time;
for my \$n (@n) {
stuff( \$n >> 18, unpack('x'.((\$n & 0x3ffc0) >> 5).'CC', \$lookup6),
+ \$n & 0x3f);
}
printf "  Lookup7 took: %.12f seconds\n", ( time() - \$start ) / \$ITERS
+;

__END__

Shift&and took: 0.000000783860 seconds
Lookup  took: 0.000001267049 seconds
Lookup3 took: 0.000001018672 seconds
Lookup4 took: 0.000001903985 seconds
Lookup5 took: 0.000002110766 seconds
Lookup6 took: 0.000001607903 seconds
Lookup7 took: 0.000001791258 seconds

Your benchmark is not very realistic. As \$n doesn't vary, you are completely ignoring the effect of the CPU cache misses that the lookup table may introduce.

The version of my benchmark I posted was the one I use -- in conjunction with -N=1 and uncommenting the print "@_"; -- to check that the output from all the methods is correct.

For timing runs, I operate on the loop counter \$_, which when -N=10e6, is enough to exercise the full range of the table 38 times.

And I still get 20%+ improvement from the AoAs lookup:

Results

```C:\test>1021064
Shift&and took: 0.000000606833 seconds
Lookup  took: 0.000000485629 seconds
Lookup2 took: 0.000000669586 seconds
(un)pack  took: 0.000007564075 seconds

In general, I've found it very difficult to detect, much less measure any discernible effect from cpu caching in perl code on my machine. It shows up readily in tight loops in C code, but Perl's memory allocation is so cache unfriendly that it rarely seems to come into play.

I would be interested to see the output from the above on your machine if you have a coupe of minutes.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

I think you can shave off a lot in XS (or Inline::C) when you switch to prebound variables, ala bind_columns () in DBI or Text::CSV_XS, as much of the XS overhead is stack-work. That said, it then is still is function calls, so the stack will be involved. If you pre-bind both in- AND out- parameters, you don't need to push variables on the stack and pop return values of the stack. That should speed up quite a bit.

The next step could be looking in to op-hooks, so there are no real perl-level function calls anymore, but looking at what you already have, I'm wondering if it would be worth the effort (other than to learn from it),

The lookup solutions have array-index ops in the back. I am stunned by the 20% gain you get, and wonder if that would differ per perl version and/or architecture.

Anyway, the solution that is fastest on the machine the problem is solved on is most important, even if another method would be four times faster on z/OS with perl4 :)

Enjoy, Have FUN! H.Merijn

Create A New User
Node Status?
node history
Node Type: note [id://1021079]
help
Chatterbox?
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (9)
As of 2018-03-23 14:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
When I think of a mole I think of:

Results (294 votes). Check out past polls.

Notices?