Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re^3: Bit vector fiddling with Inline C

by BrowserUk (Patriarch)
on May 09, 2011 at 13:47 UTC ( [id://903765]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Bit vector fiddling with Inline C
in thread Bit vector fiddling with Inline C

  1. Is casting the return of SvPV to an unsigned char* a sensible thing to be doing?

    The union element returned by SvPV is defined as char *

    #define _SV_HEAD_UNION \ union { \ char* svu_pv; /* pointer to malloced string */ \
  2. Whether directly changing the bytes of a Perl variable in C runs the risk of 'breaking' Perl internals in some scenarios?

    What you are doing in the sample code--neither lengthening nor shortening the perl allocated memory, only modifying the bits within it--should be about as safe as it gets.

    There may be some risk of confusing Perl if you performed bitwise operations upon a string that was currently marked as other than bytes--eg. some form of utf.

    Perl might subsequently try to perform some operation upon the PV assuming it contains a valid utf string, which might confuse it, but I wouldn;t expect any dire consequences. I suspect that you could create the same situation by passing a utf encoded scalar to vec.

    The simple answer is don't use utf. (That is, at least don't pass utf strings to the function.)

    Ideally, it would be possible to define a typemap that rejected attempts to pass utf encoded strings, but since perl (along with the rest of the world) has chosen to conflate multiple data formats as a single type, there's not much that can be done in that regard.

    C nor any language has a mechanism for typing arrays of variable length entities, so the world is stuck with this mess until the powers that be see the problems it creates and do something sensible about it.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^4: Bit vector fiddling with Inline C
by oxone (Friar) on May 09, 2011 at 16:52 UTC

    Thanks so much - that's incredibly helpful, and exactly the kind of feedback I was looking for. It's reassuring to know that my approach isn't likely to break anything (noting your caveat about UTF data).

    I see one of the earlier responses above (also v helpful) raises the question of pass-by-value/reference. My understanding is that with a scalar parameter, Perl normally passes by value (ie. a copy goes onto the stack) whereas in C, strings are always passed around by reference.

    I would assume from my example in the OP that the C world 'wins out' here and the $vector is passed to the C function by reference (even though it's called in Perl with $vector rather than \$vector)? I'm assuming that because changing it in C also changes the original scalar back in Perl.

    That becomes important if $vector happens to be huge - it would otherwise be memcopied as part of the call (which I think is one of the points anonymized user 468275 raises about efficiency).

    I don't suppose your knowledge of the internals can confirm that the example in the OP is indeed just passing a pointer to $vector, and NOT copying the entire byte sequence somewhere else at the same time (even just as a side-effect)?

    Update: After some further tests, this is a duff question. Even in pure Perl, calling a function with a very large scalar as a parameter does not immediately take up twice the memory by copying the variable. (I think that's because an alias to the variable is put onto @_, although I may be wrong?) The 'double the memory' effect only happens if, inside your function, you then assign it to another variable with something like 'my $var = shift'.

      I don't suppose your knowledge of the internals can confirm that the example in the OP is indeed just passing a pointer to $vector, and NOT copying the entire byte sequence somewhere else at the same time (even just as a side-effect)?

      Yes. I can confirm that. No copying is done.

      When you define an XS argument as SV* sv_vec, you are asking for a pointer to the SV. When you operate via that pointer, you are changing the original SV.

      As with ordinary perl subs, the subroutines receives aliases to the actual variables passed:

      sub x{ ++$_ for @_ };; ( $a, $b, $c) = 12345..12347;; x( $a, $b, $c );; print $a, $b, $c;; 12346 12347 12348

      No copying occurs unless the programmer assigns them to local vars:

      sub x{ my( $a, $b, $c ) = @_; ++$_ for $a, $b, $c; }

      If I am defining perl subs to operate upon large scalars, and they are more complex than a couple of lines--at which point the $_[0], $_[1] nomenclature can become awkward--then I will use scalar refs:

      sub xyz (\$) { my $rStr = shift; substr $$rStr, ...; vec $$rStr, ...; ... }

      Which achieves the benefit of named variables without the cost of copying.

      BTW: You still haven't mentioned what the "complex processing" you are performing in XS is?

      I ask because my instinctual reaction that if you are performing boolean operations on whole pairs or more of large bit vectors, it is almost certainly quicker doing it in Perl.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Thanks! Re this point >>BTW: You still haven't mentioned what the "complex processing" you are performing in XS is?<< -- sorry, not trying to be mysterious, was just trying to keep the question focused!

        One real example is this: there are 2 bit vectors of different sizes (each in the range of 1-8m bits) with irregular, many-to-many relationships between the bits in each vector. (The relationships are represented separately by a long array of pairs of ints.) One real function is to find every set bit in vector A, then set the corresponding bit(s) in vector B.

        So: simple bitwise ops are out of the question because the vectors are of different sizes. I coded it first in pure Perl using the vec() function, but it was pretty slow (one call to vec() to test/set each bit). So, I switched to Inline::C for the heavy lifting and it's now over 20x faster (ie. the entire test/set loop inside one C function).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://903765]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (7)
As of 2024-04-23 14:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found