### Re^8: Bit vector fiddling with Inline C

by BrowserUk (Pope)
 on May 10, 2011 at 07:06 UTC ( #903908=note: print w/replies, xml ) Need Help??

in reply to Re^7: Bit vector fiddling with Inline C
in thread Bit vector fiddling with Inline C

Two questions:

1. Why test the bit and then set it?

If you are only going to set the bit if it is unset, then just setting it achieves the same thing.

2. Why would calling a function to set a bit be quicker than setting the bit directly?

I guess the compiler might inline the function and optimise away the overheads, but still I don't see why it would end up being any quicker. As quick possibly.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
• Comment on Re^8: Bit vector fiddling with Inline C

Replies are listed 'Best First'.
Re^9: Bit vector fiddling with Inline C
by syphilis (Chancellor) on May 10, 2011 at 23:45 UTC
Why test the bit and then set it?

The "tstbit(i)" is being done on vector A, and the "setbit(i)" is being done on vector B - but only when the "tstbit(i)" returns true.
This is in accordance with the op's description of finding "every set bit in vector A, then set the corresponding bit(s) in vector B".
Of course, this is a rather naive and inefficient approach. A smarter approach would be to do B = A|B, or in gmp parlance mpz_ior(B, A, B);

Why would calling a function to set a bit be quicker than setting the bit directly?

As you suggest, they might not be any quicker, and could even be slower. For single bit operations, you're probably right. (The assembler code on which gmp is generally built could count as something in its favour - but it's not always going to come into play.)
Personally however, at least where *big* vectors are involved, I'd want to see benchmarks done before ruling out gmp.

Cheers,
Rob
in accordance with the op's description of finding "every set bit in vector A, then set the corresponding bit(s) in vector B".

Okay. I missed that. Makes sense now.

A smarter approach would be to do B = A|B, or in gmp parlance mpz_ior(B, A, B);

'cept that the OP also said that the two vectors aren't in the same order (or even size) and the mapping isn't one to one. But rather, driven by pairs of numbers, a better description might be if( vecA[ 123 ] ) then vecB[ 456 ] = vecB[ 012 ] = 1;

I'd want to see benchmarks done before ruling out gmp.

Me too. Though I don't think the size of the vectors will affect the results. Something like:

int mytest3(SV* sv_vec, unsigned int bit) { STRLEN vecbytes; // Length of vector in bytes unsigned __int64 *vec = (unsigned __int64 *) SvPV(sv_vec, vecbytes +); if( bit / 8 >= vecbytes) return 0; // Check in range vec[ bit / 64] |= ( 1ULL << ( bit % 64 ) ); // Set bit (CHANGES \$v +ector) return 1; }

will handle vectors up to 2 Exabytes with linear efficiency--assuming you had enough memory to hold it :)

Mind you, if I was going to be processing a list of pairs, then I'd move the acquisition of the vector addresses & lengths out of the loop and use macros for the bit testing and setting:

#define REGSIZE 64 #define TSTVEC( v, b ) v[ b / REGSIZE ] & ( 1ULL << ( b % REGSIZE ) ) + ) #define SETVEC( v, b ) v[ b / REGSIZE ] |= ( 1ULL << ( b % REGSIZE ) ) + )

In theory at least, the compiler is more likely to be able to optimise common sub-expression if they do not cross function boundaries. Even in-lined function boundaries.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
'cept that the OP also said that the two vectors aren't in the same order (or even size) and the mapping isn't one to one

Aaah ... yes ... that makes it a little more challenging ....

Cheers,
Rob
Re^9: Bit vector fiddling with Inline C
by oxone (Friar) on May 10, 2011 at 08:11 UTC
On (1) I think he's just illustrating how to iterate/test/set bits via that library. On (2) I see what you mean and will do some tests to compare. Just guessing, but maybe the library method might prove to be quicker if it operates on words rather than bytes.
Just guessing, but maybe the library method might prove to be quicker if it operates on words rather than bytes.

I thought that using bigger, and particularly register sized chunks, might make some difference, given that loading/using sub-register sized operands is generally considered to be more expensive. However, I tried addressing the string as an array of both 32-bit and 64-bit ints:

int mytest2(SV* sv_vec, unsigned int bit) { STRLEN vecbytes; // Length of vector in bytes unsigned int *vec = (unsigned int*) SvPV(sv_vec, vecbytes); if( bit / 8 >= vecbytes) return 0; // Check in range vec[ bit / 32 ] |= ( 1U << ( bit % 32 ) ); // Set bit (CHANGES \$ve +ctor) return 1; } int mytest3(SV* sv_vec, unsigned int bit) { STRLEN vecbytes; // Length of vector in bytes unsigned __int64 *vec = (unsigned __int64 *) SvPV(sv_vec, vecbytes +); if( bit / 8 >= vecbytes) return 0; // Check in range vec[ bit / 64] |= ( 1ULL << ( bit % 64 ) ); // Set bit (CHANGES \$v +ector) return 1; }

And whatever difference it made if any, was entirely lost in the noise of benchmarking. The relative ordering ot bytes/dwords/qwords interchange randomly with every run:

C:\test>903727.pl Rate qwords bytes dwords vec qwords 3.05/s -- -2% -2% -25% bytes 3.13/s 2% -- -0% -23% dwords 3.13/s 2% 0% -- -23% vec 7.70/s 151% 149% 147% -- C:\test>903727.pl Rate qwords bytes dwords vec dwords 3.10/s -- -0% -1% -60% bytes 3.11/s 0% -- -0% -60% qwords 3.13/s 1% 0% -- -59% vec 7.69/s 148% 147% 146% --

Of course, for best possible performance, you would need to ensure that the pointer was register-size aligned. Perl does allocate strings starting with such alignment, though the SvOOK optimisation can mean that the pointer you receive from SvPV isn't so aligned if the scalar has been fiddled with after allocation, but that is not the case in this benchmark.

My guess is that optimising compilers already generate code to "unroll the loop" for such accesses, and so this attempt at manual optimisation is unnecessary. I'd like to try and verify that by having the compiler produce the assembler, but every attempt I've made to pass additional compiler options to Inline C cause it to fail to build. Even if CCFLAGS =>'', is still fails, when without that option it succeeds :(

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Create A New User
Node Status?
node history
Node Type: note [id://903908]
help
Chatterbox?
 [karlgoethebier]: shmem]: "The Other Team" sounds like "The Dark Side Of The Force" in this context [karlgoethebier]: Ouch! [karlgoethebier]: crawls back to his cell [shmem]: karlgoethebier: they even program in php to avoid perl. Go figure. [LanX]: hng

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (11)
As of 2018-03-20 18:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
When I think of a mole I think of:

Results (258 votes). Check out past polls.

Notices?