Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Bit vector fiddling with Inline C

by oxone (Friar)
on May 09, 2011 at 08:09 UTC ( [id://903727]=perlquestion: print w/replies, xml ) Need Help??

oxone has asked for the wisdom of the Perl Monks concerning the following question:

I'd be very grateful for any feedback from experts on Inline C and Perl internals. I'm working with bit vectors and creating several Inline C functions which (among other things) need to test/set bits.

The example code below illustrates how I am passing a bit vector from Perl into C, and testing/setting bits in C, such that the original Perl variable is changed (=intended).

use strict; use warnings; use Inline C => 'DATA'; my $vector = "\0" x 100; # Bit vector with 800 bits mytest($vector, 12); # Pass it to Inline C function print "Result:" . vec($vector, 12, 1) . "\n"; # Prints '1' __DATA__ __C__ int mytest(SV* sv_vec, unsigned int bit) { STRLEN vecbytes; // Length of vector in bytes unsigned char *myvec = (unsigned char *) SvPV(sv_vec, vecbytes); if (bit/8 >= vecbytes) return 0; // Check in range if (myvec[bit/8] & 1U<<(bit%8)) return 1; // Test if a bit is set myvec[bit/8] |= 1U<<(bit%8); // Set bit (CHANGES $vector) return 1; }

This works, but I am wondering if this approach is safe & correct, or if there are hidden gotchas or better/more efficient ways to do this with Inline C?

Replies are listed 'Best First'.
Re: Bit vector fiddling with Inline C
by chrestomanci (Priest) on May 09, 2011 at 08:34 UTC

    Just checking, but did you know that perl also has bitwise operators? You can do your bit vector operations in perl just as easily, and if you are more familiar with perl then you can try out bits of syntax in the perl debugger as usual. Just use sprintf '%b' or sprintf '%032b' to render a number in binary, and 0b to type one in.

    eg to try out a bitwise OR:

    sprintf '%032b', ( 0b1100 | 0b0110 )

    Having said that, if your inline function needs to do a lot more than just basic bitwise manipulations, then there is no reason why you should not do it in C instead of perl. I am fairly sure that the perl and C syntax are the same, though the operator precedence might be different. I suggest you try and see what happens. You can debug with lots of printf statements or use another debugger.

      Thanks for the suggestions! Am familiar with bitwise ops in Perl (and with the vec() function which is also very handy here). To clarify: example code is simplified to illustrate the Perl/C interface which is the nub of the question here - the real Inline C is doing a lot more, such that Perl's bitwise ops and vec() aren't enough.
Re: Bit vector fiddling with Inline C
by BrowserUk (Patriarch) on May 09, 2011 at 12:33 UTC

    (Warning:very limited expertise.)

    Nothing obviously wrong leaps out from what you've posted.

    But that isn't going to be any quicker to use than the equivalent Perl code:

    vec( $vector, $bit, 1 ) ||= 1;

    A quick test shows that it is considerably slower:

    use strict; use warnings; use Benchmark qw[ cmpthese ]; use Inline C => 'DATA'; my( $vec1, $vec2 ) = ( ( chr(0) x 125000 ) x 2 ); cmpthese -3, { inline => sub { mytest( $vec1, $_ ) for 0 .. 1e6-1; }, vec => sub { vec( $vec2, $_, 1 ) ||= 1 for 0 .. 1e6-1; }, }; warn "Different results" unless $vec1 eq $vec2; __DATA__ __C__ int mytest(SV* sv_vec, unsigned int bit) { STRLEN vecbytes; // Length of vector in bytes unsigned char *myvec = (unsigned char *) SvPV(sv_vec, vecbytes); if (bit/8 >= vecbytes) return 0; // Check in range if (myvec[bit/8] & 1U<<(bit%8)) return 1; // Test if a bit is set myvec[bit/8] |= 1U<<(bit%8); // Set bit (CHANGES $vector) return 1; }

    Results:

    C:\test>903727.pl Rate inline vec inline 3.11/s -- -60% vec 7.77/s 150% -- C:\test>903727.pl Rate inline vec inline 3.10/s -- -60% vec 7.77/s 151% --

    You mention in a later post that "The real case applies some fairly complex logic to multiple large bit vectors (each 1m+ bits) which runs a lot faster in C.". That raises a couple of questions:

    • If the real code is so complex, why are you asking us to make judgements based on such a trivial example that can never meet its stated goal of greater efficiency?
    • The way you've asked the question suggests that you are unsure about the parameter handling rather than the actual internal logic. Where exactly do your doubts lie?

    Update: There is also the question of why you are setting the bit conditionally? That is, your calling code will never be able to tell the difference between the situation where the bit was previously unset; and when it was already set.

    So why bother testing if it is set and not just set it?


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Thanks for the useful benchmarking info vs Perl's vec(), and my apologies for lack of clarity in the OP. In answer to your queries:

      >>If the real code is so complex, why are you asking us to make judgements based on such a trivial example that can never meet its stated goal of greater efficiency?<< I was trying to present a 'minimal case' to illustrate the particular aspects I'm unsure about.

      >>The way you've asked the question suggests that you are unsure about the parameter handling rather than the actual internal logic. Where exactly do your doubts lie?<< What I'm unsure about is two things:

      (a) Correct method for directly accessing the bytes of a Perl variable from Inline C? My example works but I'm not entirely sure it's the "right way". Is casting the return of SvPV to an unsigned char* a sensible thing to be doing?

      (b) Whether directly changing the bytes of a Perl variable in C runs the risk of 'breaking' Perl internals in some scenarios?

        1. Is casting the return of SvPV to an unsigned char* a sensible thing to be doing?

          The union element returned by SvPV is defined as char *

          #define _SV_HEAD_UNION \ union { \ char* svu_pv; /* pointer to malloced string */ \
        2. Whether directly changing the bytes of a Perl variable in C runs the risk of 'breaking' Perl internals in some scenarios?

          What you are doing in the sample code--neither lengthening nor shortening the perl allocated memory, only modifying the bits within it--should be about as safe as it gets.

          There may be some risk of confusing Perl if you performed bitwise operations upon a string that was currently marked as other than bytes--eg. some form of utf.

          Perl might subsequently try to perform some operation upon the PV assuming it contains a valid utf string, which might confuse it, but I wouldn;t expect any dire consequences. I suspect that you could create the same situation by passing a utf encoded scalar to vec.

          The simple answer is don't use utf. (That is, at least don't pass utf strings to the function.)

          Ideally, it would be possible to define a typemap that rejected attempts to pass utf encoded strings, but since perl (along with the rest of the world) has chosen to conflate multiple data formats as a single type, there's not much that can be done in that regard.

          C nor any language has a mechanism for typing arrays of variable length entities, so the world is stuck with this mess until the powers that be see the problems it creates and do something sensible about it.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Bit vector fiddling with Inline C
by anonymized user 468275 (Curate) on May 09, 2011 at 09:34 UTC
    Is the data very large or is there some kind of external DMA interface requirement? Or where exactly is Perl not delivering in this particular case?

    One world, one people

      It just comes down to speed. The real case applies some fairly complex logic to multiple large bit vectors (each 1m+ bits) which runs a lot faster in C. What I have works well, but as noted in the OP I'm unsure if my approach is risky vs Perl's internals.
        OK now we know a bit more, here are a couple of thoughts - repeatedly manipulating such data in Perl even for interfacing will be slow. There are C macros (see perlguts) for managing data declared in Perl and passed by reference (C-style, not Perl-style). But I'd aim to get it into C static storage (scope it just outside your C routines) asap and leave it there (call C routines from Perl at every point it needs using in any way), preferably loading it directly into C in the first place and avoid reallocating C memory repeatedly i.e. when completely done with one 1Mb chunk, reuse the same static storage for successive such chunks to process. Don't use the SV char* type -- use e.g. static int fred*250000 (update: unsigned) (scoped before the C routines are declared but after __C__) for your C storage to avoid ASCII-zero truncation by libc whenever a null byte is encountered.

        One world, one people

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://903727]
Approved by Corion
Front-paged by Old_Gray_Bear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (3)
As of 2024-04-20 01:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found