Detecting whether UV fits into an NV

syphilis has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Detecting whether UV fits into an NV by stevieb (Canon) on Feb 26, 2020 at 01:45 UTC
"It annoys me that I can't find a way to detect and shift all of the trailing zero bits off in one hit - and that I instead have to detect and shift them off one at a time. " Depending what platform you're on, check if the `__builtin_ctz()` function is available to you. The `ctz` stands for 'count trailing zeros'. Use this to get the number of trailing zeros, then shift them all off at once. Update: I've added in `bitCount()`, which allows you to send in the number after stripping off the trailing zeros, and returns a count of the remaining bits. `#include <stdio.h> #include <math.h> unsigned int bitCount(int num); void main () { /* Here's your example number, in binary format for visual purpose +s */ int num = 0b10100000; int noTrailingZeros = num >> __builtin_ctz(num); unsigned int bitsRemain = bitCount(noTrailingZeros); printf( "Zeros: %d, No Zeros: %d, Bits remain: %d\n", num, noTrailingZeros, bitsRemain ); } unsigned int bitCount(int num) { unsigned int bits = 0; while (num) { bits++; num >>= 1; } return bits; }` [download] Output: `Zeros: 640, No Zeros: 5, Bits remain: 3` [download]	[reply] [d/l] [select]
Re: Detecting whether UV fits into an NV by dave_the_m (Monsignor) on Feb 26, 2020 at 09:27 UTC
Why not just cast the UV value to an NV, cast back to an UV, and see if the first and final values differ? Dave	[reply]
Re^2: Detecting whether UV fits into an NV by syphilis (Archbishop) on Feb 26, 2020 at 10:58 UTC
Why not just cast the UV value to an NV, cast back to an UV, and see if the first and final values differ? I think that checking whether the UV `arg == (UV)((NV)arg)` is a splendid idea. Interestingly, it's only very slightly faster (at least for the value ranges I've tested) than the XSub I posted at the beginning of this thread, but it's far, far simpler. As an aside, whilst this approach is quite simple to implement inside XS space, is it even possible to do inside perl space ? That is, inside a perl sub, how would one coerce a UV to an NV and then back to a UV ? (This aspect is not an issue for me - which is the reason that I've presented it "as an aside". Just curious as to if/how it's possible, that's all.) Thanks Dave. Cheers, Rob	[reply] [d/l]
Re^3: Detecting whether UV fits into an NV by dave_the_m (Monsignor) on Feb 26, 2020 at 12:48 UTC
In general perl goes to great trouble to use the type which will be as lossless as possible rather than the type you want, so it wouldn't be easy. Maybe messing about with pack/unpack? Dave.	[reply]
Re^3: Detecting whether UV fits into an NV by syphilis (Archbishop) on Feb 27, 2020 at 05:57 UTC
I think that checking whether the UV arg == (UV)((NV)arg) is a splendid idea Unfortunately this check fails for some UV values with less recent Microsoft Compilers - eg Visual Studio 2010 and earlier. No problems with Visual Studio 2017 and Visual Studio 2019. I suspect that Visual Studio 2015 is the oldest Microsoft Compiler that doesn't suffer from this problem, but I haven't confirmed that. This particular issue looks like another manifestation of the problem that sent me down this path in the first place. Here's the demo I ran: use strict; use warnings; use Inline C => Config => BUILD_NOISY => 1; use Inline C => <<'EOC'; int uv_fits_double1( UV arg ) { if(arg == (UV)((NV)arg)) return 1; return 0; } int uv_fits_double2(UV arg) { while(!(arg & 1)) arg >>= 1; if(arg < 9007199254740993) return 1; return 0; } EOC my $ls = 63; my @in = ( 18446744073709549568, 18446744073709139968); for(@in) { print "$_: ", uv_fits_double1($_),uv_fits_double2($_), "\n" +; } __END__ With Windows Server 2003 SP1 Platform SDK, this outputs: 18446744073709549568: 01 18446744073709139968: 01 Base 2 representations of the 2 Uvs is (resp): 1111111111111111111111111111111111111111111111111111100000000000 1111111111111111111111111111111111111111111110011011100000000000 showing that both are exactly representable by a double. [download] Cheers, Rob	[reply] [d/l]
Re^4: Detecting whether UV fits into an NV by pryrt (Abbot) on Feb 27, 2020 at 14:53 UTC
Re^3: Detecting whether UV fits into an NV by ikegami (Patriarch) on Feb 27, 2020 at 07:29 UTC
`my $x = ( 1 << 54 ) \| 1; say $x; # 18014398509481985 $x = unpack 'F', pack 'F', $x; $x = unpack 'J', pack 'J', $x; say $x; # 18014398509481984` [download] To support negative numbers, use `I` instead of `J` for negative numbers. The following is a little bit truer to a cast, but it's far more complicated, far more fragile (can fail for overloaded and magical vars), and technically uses XS: `use strict; use warnings; use feature qw( say ); use B qw( svref_2object ); my $x = ( 1 << 54 ) \| 1; say $x; # 18014398509481985 # Add the value as an NV to the scalar. { no warnings qw( void ); log $x; } # Make it so the scalar contains just an NV. $x = svref_2object(\$x)->NV; # Add the value as an IV or UV to the scalar. { no warnings qw( void ); $x \| 1; } # Make it so the scalar contains just an IV or UV. $x = svref_2object(\$x)->int_value; say $x; # 18014398509481984` [download]	[reply] [d/l] [select]
Re^4: Detecting whether UV fits into an NV by syphilis (Archbishop) on Feb 27, 2020 at 10:42 UTC
Re: Detecting whether UV fits into an NV by stevieb (Canon) on Feb 26, 2020 at 02:33 UTC
Here's a hastily thrown together short program that does all the work. Removes trailing zeros, counts the number of bits remaining in the number, and informs you whether the number fits within 53 bits. #include <stdio.h> #include <stdint.h> #include <math.h> #define MAXLEN 53 unsigned int bitCount(unsigned long num); uint8_t doesItFit(unsigned long num); void main () { int num = 0b10100000; unsigned long bigNum = 18446744073709551615; // 2^64-1 printf( "Fit? Small: %d, Big: %d\n", doesItFit(num), doesItFit(bigNum) ); } unsigned int bitCount(unsigned long num) { unsigned int bits = 0; while (num) { bits++; num >>= 1; } return bits; } uint8_t doesItFit (unsigned long num) { num = num >> __builtin_ctz(num); unsigned int numBits = bitCount(num); return numBits <= MAXLEN ? 1 : 0; } [download] Output: `Fit? Small: 1, Big: 0` [download]	[reply] [d/l] [select]
Re: Detecting whether UV fits into an NV by GrandFather (Saint) on Feb 26, 2020 at 03:29 UTC
Is this an "It annoys me because" it doesn't look elegant my application needs it to be performant and even a short loop isn't some other reason I'm going to tell you about If it is case 1 then I suspect you are out of luck. The only other ways I can think of doing it are even worse, or hardware dependent, or are a maintenance nightmare. For case 2 there are techniques like binary searches that maybe can run faster or a really nasty looking series of if statements that amount to unrolling the while loop. Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond	[reply]
Re^2: Detecting whether UV fits into an NV by syphilis (Archbishop) on Feb 26, 2020 at 05:58 UTC
I'm just looking at the lack of elegance - that's pretty much it. I've no objection to inelegant code (of which I've written heaps) if it needs to be that way. If not for the portability concerns, I'd pick up on stevieb's suggestion and just have the XSub as : `int uv_fits_double(UV arg) { arg >>= __builtin_ctzll(arg); if(arg < 9007199254740993) return 1; return 0; }` [download] That seems to work fine on MinGW-built Windows perls, where $Config{ivtype} is 'long long', and I'd be happy with that even if it wasn't any more efficient. I didn't know about the existence of __builtin_ctz() and friends. Thanks, stevieb for bringing it to my attention. I'll do some benchmarking later on, and see how much time it saves. It's an option that could come in handy to me in the future. I guess that, mainly, I just want to know if there are some more elegant (&& portable) options available. Cheers, Rob	[reply] [d/l]
Re: Detecting whether UV fits into an NV by jcb (Parson) on Feb 27, 2020 at 03:10 UTC
If you have RAM to spare, (you are using perl, of course you do) you could build a 2048-entry lookup table, mask `arg` with 2047, shift once with the value from the table and then compare with 9007199254740993 once. Something like: (untested) `static uint8_t ctz_table_11bit[2048] = { /* elided / }; int uv_fits_double(UV arg) { arg >>= ctz_table_11bit[arg & 2047]; return arg < 9007199254740993; }` [download] The relative elegance of this solution is probably a matter of opinion, but it should run much faster, since it is theoretically a single basic block and should be a candidate for inlining. This trades considerable computation for a single memory read, which will have relative performance depending on your hardware. Here is some Perl code to generate that table: (I think this is the first time I have used formats; output spot checked for plausibility) #!/usr/bin/perl use strict; use warnings; use constant BITS => 11; our @pos; our @bits; our @val; format STDOUT_TOP= . $= = 1<<(BITS-1); format STDOUT = / @### -> @0######### / @#, / @### -> @0######### / @#, $pos[0], $bits[0], $val[0], $pos[1], $bits[1], $val[1], . print "static uint8_t ctz_table_",BITS,"bit[",1<<BITS,"] = {\n"; for (my $pos = 0; $pos < (1<<BITS); $pos += 2) { @pos = ($pos, $pos+1); @bits = map substr(unpack('B', pack 'n', $_), -(BITS)), @pos; @val = map length((m/(0?)\z/)[0]), @bits; write } print "};\n"; # EOF [download] Update:* change two magic numbers to use symbolic constant.	[reply] [d/l] [select]
Re^2: Detecting whether UV fits into an NV by syphilis (Archbishop) on Mar 02, 2020 at 08:07 UTC
Thanks - that's certainly one way to implement ctz. (Nice !!) In general, according to my benchmarking, there's very little speed gain in using this method in an XSub as compared to using this: `int uv_fits_double (UV arg) { while(!(arg & 1)) arg >>= 1; if(arg < 9007199254740993) return 1; return 0; }` [download] Obviously, for certain values there will be a significant gain, but for a random selection of values it works out at pretty much the same. For the record, I've based the above assertion on the output of the script that's presented below in the readmore section. Cheers, Rob Read more... (9 kB)	[reply] [d/l] [select]
Re^3: Detecting whether UV fits into an NV by jcb (Parson) on Mar 03, 2020 at 00:31 UTC
I am not particularly surprised. The number in the table is also the number of times the loop will iterate in the simple implementation and literally half of the table entries are zero. Of the remaining half, half are one. Of that remaining half, half are two, and so forth. The cost of a memory read on modern hardware seems to be high enough that the lookup table does not give enough of an advantage. Perhaps the bit-twiddling solution roboticus offered will work better for you? Or is your testing script swamping the function under test with Perl-side type conversion and loop overhead? I suggest moving the iteration into C instead of walking a Perl array, since I suspect that both of the solutions in your test probably have running times dominated by the Perl `for` iterating over the array and the XS magic that converts an `SV` into a `UV`. Moving that iteration into C is probably necessary in any case: if you will be calling this predicate from Perl code, you need to know the overhead for that call itself to determine if optimizing `uv_fits_double` will actually provide any overall performance gain. Also note that `uv_fits_double3` as presented here enters an infinite loop if given zero as an argument.	[reply] [d/l] [select]
Re^4: Detecting whether UV fits into an NV by syphilis (Archbishop) on Mar 04, 2020 at 04:07 UTC
Re: Detecting whether UV fits into an NV by roboticus (Chancellor) on Feb 27, 2020 at 14:31 UTC
syphilis: I poked around with it and came up with something for you ~~to figure it out~~ without the looping. $ cat funky_bits.cc // funky_bits.cpp // // experiment with odd bit handling? // // 20200227 #include <stdio.h> // Some test cases unsigned long long foo[] = { 0xcab312, 0xdeadbeef, 0xdeadbeef0, 0xdeadbeef000, 0x138, 9007199254740992ULL, 18446744073709551615ULL }; // Needs a good name, and the type shenanigans should be cleaned up bool duzzit_fit(unsigned long long t) { // First we need to identify the lowest bit set long long neg_t = -(long long)t; long long last_set = t & neg_t; // Shift it left 52 bits to get location of lowest invalid bit // NOTE: If smallest invalid bit is far enough left, then this wil +l turn into 0 // making the invalid bits also 0, which happens to be OK! long long smallest_invalid = last_set << 52; // Convert it to a bit mask for the smallest invalid bit and those + to the left. unsigned long long valid_bits = smallest_invalid - 1; unsigned long long invalid_bits = ~valid_bits; // Uncomment for debugging //printf( " t:%16llx\n" // " last_set:%16llx\n" // " neg_t:%16llx\n" // "smallest_invalid:%16llx\n" // " valid_bits:%16llx\n" // " invalid_bits:%16llx\n", // t, last_set, neg_t, smallest_invalid, // valid_bits, invalid_bits); // It's a 'valid' number unless one of the upper-order invalid bit +s are set return ! (t & invalid_bits); } int main(int, char ) { for (int z=0; z<sizeof(foo)/sizeof(foo[0]); ++z) { unsigned long long i = foo[z]; bool b = duzzit_fit(i); printf("%lld: %s\n", i, b ? "OK" : "NOPE"); } return 0; } $ clang -o funky_bits funky_bits.cc $ ./funky_bits 13284114: OK 3735928559: OK 59774856944: OK 15302363377664: OK 312: OK 9007199254740992: OK -1: NOPE** [download] Cleaning it up, adding more test cases, further testing, debugging and simplification are all left as exercises for the reader... ;^D I was originally thinking of making a lookup table, but thought of this along the way. Update: fixed the abomination that was the first sentence. ...roboticus When your only tool is a hammer, all problems look like your thumb.	[reply] [d/l]
Re^2: Detecting whether UV fits into an NV by syphilis (Archbishop) on Mar 04, 2020 at 07:48 UTC
I was originally thinking of making a lookup table, but thought of this along the way. Thanks for that - I still haven't worked my way through it properly. It seems to work fine now that I've changed: `long long smallest_invalid = last_set << 52; to long long smallest_invalid = last_set << 53;` [download] Without that change, the following values (for example) wrongly reported "NOPE": `1844674407366955264, 1844674407366955776, 1844674407366956288, 1844674407366956800, 1844674407366957312, 1844674407366957824, 1844674407366958336, 1844674407366958848, 1844674407366959360, 1844674407366959872, 1844674407366960384, 1844674407366960896, 1844674407366961408, 1844674407366961920, 1844674407366962432, 1844674407366962944, 1844674407366963456, 1844674407366963968, 1844674407366964480, 1844674407366964992` [download] In the following XS adaptation it's taking over twice as long as uv_fits_double3. use warnings; use Benchmark; use Inline C => Config => BUILD_NOISY => 1; use Inline C => <<'EOC'; int uv_fits_double3(SV * x, ...) { dXSARGS; int i, count = 0; UV arg; for(i = 0; i < items; i++) { arg = SvUV(ST(i)); if(arg) { while(!(arg & 1)) arg >>= 1; if(arg < 9007199254740993) count++; } } return count; } int uv_fits_double_bitfiddle(SV * t, ...) { dXSARGS; int i, count = 0; for(i = 0; i < items; i++) { /* First we need to identify the lowest bit set / IV neg_t = -(SvIV(ST(i))); IV last_set = SvUV(ST(i)) & neg_t; / Shift it left 53 bits to get location of lowest invalid bit + * * NOTE: If smallest invalid bit is far enough left, then this wil +l * * turn into 0 making the invalid bits also 0, which happens to be + OK! / IV smallest_invalid = last_set << 53; UV valid_bits = smallest_invalid - 1; UV invalid_bits = ~valid_bits; / It's a 'valid' number unless one of the upper-order invalid bit +s are set */ if(!(SvUV(ST(i)) & invalid_bits)) count++; } return count; } EOC @in = (1844674407366955161 .. 1844674407378955161); # @in = (9007199248740992 .. 9007199260740992); # @in = (184467436737095 .. 184467448737095); # @in = (184463440737 .. 184475440737); die "wrong size" unless @in == 12000001; timethese (1, { 'uv_fits_double3' => '$count1 = uv_fits_double3(@in);', 'uv_fits_double_bitfiddle' => '$count2 = uv_fits_double_bitfiddle(@in) +;', }); print "$count1 $count2\n"; die "Error in at least one of the XSubs" unless $count1 == $count2; __END__ [download] Benchmarking results on my Windows 7 machine with perl-5.30.0 are: `Benchmark: timing 1 iterations of uv_fits_double3, uv_fits_double_bitf +iddle... uv_fits_double3: 0 wallclock secs ( 0.44 usr + 0.00 sys = 0.44 CPU) + @ 2.29/s (n=1) (warning: too few iterations for a reliable count) uv_fits_double_bitfiddle: 1 wallclock secs ( 0.94 usr + 0.00 sys = +0.94 CPU) @ 1.07/s (n=1) (warning: too few iterations for a reliable count) 12000001 12000001` [download] Can you see room for significant improvement in performance of uv_fits_double_bitfiddle ? Thanks for another interesting approach !! Cheers, Rob	[reply] [d/l] [select]
Re^3: Detecting whether UV fits into an NV by roboticus (Chancellor) on Mar 04, 2020 at 23:33 UTC
Rob: I'm not experienced in using Inline::C, but I thought I'd take a look at it. My first guess is that the overhead in `SvUV(ST(i))` is causing twiddle to be slower, so I tweaked the routine to explicitly pull arg into a UV as you did in double3: `$ diff pm_11113750_orig.pl pm_11113750_a.pl 26a27 > UV arg; 30,31c31,33 < IV neg_t = -(SvIV(ST(i))); < IV last_set = SvUV(ST(i)) & neg_t; --- > arg = SvUV(ST(i)); > IV neg_t = -arg; > IV last_set = arg & neg_t; 42c44 < if(!(SvUV(ST(i)) & invalid_bits)) count++; --- > if(!(arg & invalid_bits)) count++;` [download] Doing that change alone made the routines roughly the same speed: $ PERL5OPT= perl pm_11113750_orig.pl Benchmark: timing 5 iterations of uv_fits_double3, uv_fits_double_bitf +iddle... uv_fits_double3: 7 wallclock secs ( 6.70 usr + 0.00 sys = 6.70 CPU) + @ 0.75/s (n=5) uv_fits_double_bitfiddle: 17 wallclock secs (17.36 usr + 0.00 sys = 1 +7.36 CPU) @ 0.29/s (n=5) 46875 46875 $ PERL5OPT= perl pm_11113750_a.pl Benchmark: timing 5 iterations of uv_fits_double3, uv_fits_double_bitf +iddle... uv_fits_double3: 7 wallclock secs ( 6.75 usr + 0.00 sys = 6.75 CPU) + @ 0.74/s (n=5) uv_fits_double_bitfiddle: 6 wallclock secs ( 6.67 usr + 0.00 sys = +6.67 CPU) @ 0.75/s (n=5) 46875 46875 [download] Since the twiddle part doesn't have any loops, I would expect it to be significantly faster than a version with loops, depending on how clever the optimizer is. So another version I played with was to modify the routine to take two parameters and count the number of values that fit between those two values. That way, I could greatly reduce the impact of the SvUV() function at the cost of only being able to process sequences of numbers. That version confirmed my suspicion that the SvUV() function is slow enough that it swamps the difference in our algorithms. int uv_fits_double3x(SV* the_min, SV* the_max) { dXSARGS; UV i_min = SvUV(the_min); UV i_max = SvUV(the_max); UV i; int count = 0; UV arg; for(i = i_min; i < i_max; i++) { arg = i; if(arg) { while(!(arg & 1)) arg >>= 1; if(arg < 9007199254740993) count++; } } return count; } int uv_fits_double_bitfiddle_3x(SV* the_min, SV* the_max) { dXSARGS; UV i_min = SvUV(the_min); UV i_max = SvUV(the_max); UV i; int count = 0; for(i = i_min; i < i_max; i++) { UV arg = i; IV neg = -arg; IV smallest_invalid = (arg & -arg)<<53; UV valid_bits = smallest_invalid-1; UV invalid_bits = ~valid_bits; if (! (arg & invalid_bits)) count++; } return count; } [download] Both routines are much faster if they only take two arguments and iterate over the entire range. Only then does the twiddle version show a significant speed advantage at about twice as fast. The double3x version is about 1800 times faster, while fiddle3x is about 2900 times faster: $ PERL5OPT= perl pm_11113750_x.pl Name "main::count4" used only once: possible typo at pm_11113750_x.pl +line 147. === Test 1: roughly equivalent speed Benchmark: timing 5 iterations of uv_fits_double3, uv_fits_double_bitf +iddle... uv_fits_double3: 28 wallclock secs (27.22 usr + 0.08 sys = 27.30 CPU) + @ 0.18/s (n=5) uv_fits_double_bitfiddle: 27 wallclock secs (26.72 usr + 0.00 sys = 2 +6.72 CPU) @ 0.19/s (n=5) 33046878 33046878 (48000004) === TEST 2: twiddle has an advantage if we don't use SvUV() Benchmark: timing 10000 iterations of bitfiddle_3x, double3x, empty_fu +nc, empty_loop... bitfiddle_3x: 18 wallclock secs (18.39 usr + 0.00 sys = 18.39 CPU) @ +543.74/s (n=10000) double3x: 30 wallclock secs (29.55 usr + 0.00 sys = 29.55 CPU) @ 33 +8.44/s (n=10000) empty_func: 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU) (warning: too few iterations for a reliable count) empty_loop: 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU) (warning: too few iterations for a reliable count) 33046875 33046875 48000000 1416858175 [download] Unfortunately, I've not been successful in making a useful version that's faster in a real-world scenario. My current understanding of things (after playing with it for a few hours) is that the single argument versions get swamped out by the looping and function calling overhead. The functions are just too simple to be aHowever, I've not been successful in making a useful version that's significantly faster (with call overhead and such). NOTES: As you can tell from the above, I've not been successful in measuring the function call (empty func) or empty loop versions of the functions, so my benchmarking could be broken, so I'm including the current code, so people can point things out. Read more... (5 kB) ...roboticus When your only tool is a hammer, all problems look like your thumb.	[reply] [d/l] [select]
Re^4: Detecting whether UV fits into an NV by syphilis (Archbishop) on Mar 05, 2020 at 01:38 UTC
Re^5: Detecting whether UV fits into an NV by roboticus (Chancellor) on Mar 05, 2020 at 03:38 UTC
Some notes below your chosen depth have not been shown here


Don't ask to ask, just ask
	PerlMonks