Re^3: Bit vector fiddling with Inline C

Is casting the return of SvPV to an unsigned char* a sensible thing to be doing?

The union element returned by SvPV is defined as char *
```
#define _SV_HEAD_UNION \
    union {                \
    char*   svu_pv;        /* pointer to malloced string */    \
[download]
```
Whether directly changing the bytes of a Perl variable in C runs the risk of 'breaking' Perl internals in some scenarios?

What you are doing in the sample code--neither lengthening nor shortening the perl allocated memory, only modifying the bits within it--should be about as safe as it gets.
There may be some risk of confusing Perl if you performed bitwise operations upon a string that was currently marked as other than bytes--eg. some form of utf.
Perl might subsequently try to perform some operation upon the PV assuming it contains a valid utf string, which might confuse it, but I wouldn;t expect any dire consequences. I suspect that you could create the same situation by passing a utf encoded scalar to vec.
The simple answer is don't use utf. (That is, at least don't pass utf strings to the function.)
Ideally, it would be possible to define a typemap that rejected attempts to pass utf encoded strings, but since perl (along with the rest of the world) has chosen to conflate multiple data formats as a single type, there's not much that can be done in that regard.
C nor any language has a mechanism for typing arrays of variable length entities, so the world is stuck with this mess until the powers that be see the problems it creates and do something sensible about it.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

Comment on Re^3: Bit vector fiddling with Inline C Download Code

Replies are listed 'Best First'.
Re^4: Bit vector fiddling with Inline C by oxone (Friar) on May 09, 2011 at 16:52 UTC
Thanks so much - that's incredibly helpful, and exactly the kind of feedback I was looking for. It's reassuring to know that my approach isn't likely to break anything (noting your caveat about UTF data). I see one of the earlier responses above (also v helpful) raises the question of pass-by-value/reference. My understanding is that with a scalar parameter, Perl normally passes by value (ie. a copy goes onto the stack) whereas in C, strings are always passed around by reference. I would assume from my example in the OP that the C world 'wins out' here and the $vector is passed to the C function by reference (even though it's called in Perl with $vector rather than \$vector)? I'm assuming that because changing it in C also changes the original scalar back in Perl. That becomes important if $vector happens to be huge - it would otherwise be memcopied as part of the call (which I think is one of the points anonymized user 468275 raises about efficiency). I don't suppose your knowledge of the internals can confirm that the example in the OP is indeed just passing a pointer to $vector, and NOT copying the entire byte sequence somewhere else at the same time (even just as a side-effect)? Update: After some further tests, this is a duff question. Even in pure Perl, calling a function with a very large scalar as a parameter does not immediately take up twice the memory by copying the variable. (I think that's because an alias to the variable is put onto @_, although I may be wrong?) The 'double the memory' effect only happens if, inside your function, you then assign it to another variable with something like 'my $var = shift'.	[reply]
Re^5: Bit vector fiddling with Inline C by BrowserUk (Patriarch) on May 09, 2011 at 19:22 UTC
I don't suppose your knowledge of the internals can confirm that the example in the OP is indeed just passing a pointer to $vector, and NOT copying the entire byte sequence somewhere else at the same time (even just as a side-effect)? Yes. I can confirm that. No copying is done. When you define an XS argument as `SV* sv_vec`, you are asking for a pointer to the SV. When you operate via that pointer, you are changing the original SV. As with ordinary perl subs, the subroutines receives aliases to the actual variables passed: `sub x{ ++$_ for @_ };; ( $a, $b, $c) = 12345..12347;; x( $a, $b, $c );; print $a, $b, $c;; 12346 12347 12348` [download] No copying occurs unless the programmer assigns them to local vars: `sub x{ my( $a, $b, $c ) = @_; ++$_ for $a, $b, $c; }` [download] If I am defining perl subs to operate upon large scalars, and they are more complex than a couple of lines--at which point the `$_[0], $_[1]` nomenclature can become awkward--then I will use scalar refs: `sub xyz (\$) { my $rStr = shift; substr $$rStr, ...; vec $$rStr, ...; ... }` [download] Which achieves the benefit of named variables without the cost of copying. BTW: You still haven't mentioned what the "complex processing" you are performing in XS is? I ask because my instinctual reaction that if you are performing boolean operations on whole pairs or more of large bit vectors, it is almost certainly quicker doing it in Perl. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re^6: Bit vector fiddling with Inline C by oxone (Friar) on May 09, 2011 at 21:54 UTC
Thanks! Re this point >>BTW: You still haven't mentioned what the "complex processing" you are performing in XS is?<< -- sorry, not trying to be mysterious, was just trying to keep the question focused! One real example is this: there are 2 bit vectors of different sizes (each in the range of 1-8m bits) with irregular, many-to-many relationships between the bits in each vector. (The relationships are represented separately by a long array of pairs of ints.) One real function is to find every set bit in vector A, then set the corresponding bit(s) in vector B. So: simple bitwise ops are out of the question because the vectors are of different sizes. I coded it first in pure Perl using the vec() function, but it was pretty slow (one call to vec() to test/set each bit). So, I switched to Inline::C for the heavy lifting and it's now over 20x faster (ie. the entire test/set loop inside one C function).	[reply]
Re^7: Bit vector fiddling with Inline C by syphilis (Archbishop) on May 10, 2011 at 02:12 UTC
Re^8: Bit vector fiddling with Inline C by BrowserUk (Patriarch) on May 10, 2011 at 07:06 UTC
Some notes below your chosen depth have not been shown here
Re^8: Bit vector fiddling with Inline C by oxone (Friar) on May 10, 2011 at 05:56 UTC


No such thing as a small change
	PerlMonks