Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re^10: Bit vector fiddling with Inline C

by BrowserUk (Patriarch)
on May 10, 2011 at 14:34 UTC ( [id://903988]=note: print w/replies, xml ) Need Help??


in reply to Re^9: Bit vector fiddling with Inline C
in thread Bit vector fiddling with Inline C

Just guessing, but maybe the library method might prove to be quicker if it operates on words rather than bytes.

I thought that using bigger, and particularly register sized chunks, might make some difference, given that loading/using sub-register sized operands is generally considered to be more expensive. However, I tried addressing the string as an array of both 32-bit and 64-bit ints:

int mytest2(SV* sv_vec, unsigned int bit) { STRLEN vecbytes; // Length of vector in bytes unsigned int *vec = (unsigned int*) SvPV(sv_vec, vecbytes); if( bit / 8 >= vecbytes) return 0; // Check in range vec[ bit / 32 ] |= ( 1U << ( bit % 32 ) ); // Set bit (CHANGES $ve +ctor) return 1; } int mytest3(SV* sv_vec, unsigned int bit) { STRLEN vecbytes; // Length of vector in bytes unsigned __int64 *vec = (unsigned __int64 *) SvPV(sv_vec, vecbytes +); if( bit / 8 >= vecbytes) return 0; // Check in range vec[ bit / 64] |= ( 1ULL << ( bit % 64 ) ); // Set bit (CHANGES $v +ector) return 1; }

And whatever difference it made if any, was entirely lost in the noise of benchmarking. The relative ordering ot bytes/dwords/qwords interchange randomly with every run:

C:\test>903727.pl Rate qwords bytes dwords vec qwords 3.05/s -- -2% -2% -25% bytes 3.13/s 2% -- -0% -23% dwords 3.13/s 2% 0% -- -23% vec 7.70/s 151% 149% 147% -- C:\test>903727.pl Rate qwords bytes dwords vec dwords 3.10/s -- -0% -1% -60% bytes 3.11/s 0% -- -0% -60% qwords 3.13/s 1% 0% -- -59% vec 7.69/s 148% 147% 146% --

Of course, for best possible performance, you would need to ensure that the pointer was register-size aligned. Perl does allocate strings starting with such alignment, though the SvOOK optimisation can mean that the pointer you receive from SvPV isn't so aligned if the scalar has been fiddled with after allocation, but that is not the case in this benchmark.

My guess is that optimising compilers already generate code to "unroll the loop" for such accesses, and so this attempt at manual optimisation is unnecessary. I'd like to try and verify that by having the compiler produce the assembler, but every attempt I've made to pass additional compiler options to Inline C cause it to fail to build. Even if CCFLAGS =>'', is still fails, when without that option it succeeds :(


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://903988]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (3)
As of 2024-04-24 02:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found