Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Detecting whether UV fits into an NV

by syphilis (Archbishop)
on Feb 26, 2020 at 01:26 UTC ( [id://11113419]=perlquestion: print w/replies, xml ) Need Help??

syphilis has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I was initially going to mark this post as off-topic ... still not entirely sure whether I should have ....
The question relates specifically to perls for which both $Config{ivsize} and $Config{nvsize} are both 8. That is, perl's integer type (UV/IV) is 64-bit, and perl's floating point type (NV) is either a double or an 8-byte long double.

The aim is to determine whether a given integer value can be represented exactly as a double.
Clearly, any integer <= 9007199254740992 can be represented exactly as a double. (9007199254740992 == 2 ** 53.)
In addition to those values, however, any integer whose highest set bit and lowest set bit are separated by 51 or fewer bits is also exactly representable as a double.

Here follows my solution. The question is "Is there a better way ?".
Assume that the given arg is an integer in the range 0 .. 18446744073709551615 (with 18446744073709551615 being the largest possible UV value).
As an XSub:
int uv_fits_double(UV arg) { if(arg < 9007199254740993) return 1; while(!(arg & 1)) { arg >>= 1; if(arg < 9007199254740993) return 1; } return 0; }
And as perl sub:
sub uv_fits_double { my $arg = shift; return 1 if $arg < 9007199254740993; while(!($arg & 1)) { $arg >>= 1; return 1 if $arg < 9007199254740993; } return 0; }
It annoys me that I can't find a way to detect and shift all of the trailing zero bits off in one hit - and that I instead have to detect and shift them off one at a time.
The number of times that the "$arg < 9007199254740993" comparison is evaluated also annoys me. (Doing that evaluation inside the while loop means that the while loop will perform a maximum of 11 cycles. Without that evaluation it could perform up to 63 cycles.)

As I understand it, the only thing I need to determine is "Are there more than 51 bits between the highest set bit and the lowest set bit ?", and I do that by shifting off all trailing unset bits so that I can then determine (by examining the remaining value) whether it fits into 53 bits or not.
It feels like there ought to be a quicker, simpler way of doing it ... but I don't see one.
A perl demo of the uv_fits_double sub:
use strict; use warnings; use Config; die "This script not meant for this perl configuration" unless $Config{ivsize} == $Config{nvsize}; # The integer value 2251799813685249 is # representable exactly as a double. # Therefore 2251799813685249 * (2 ** 10) # is exactly representable as a double, # since 10 is well within the allowable # exponent range. # 2251799813685249 * (2 ** 10) is also # within the bounds of allowable integer # values. my $fits_d = 2305843009213694976; # 2251799813685249 * (2 ** 10) my $no_fits_d = $fits_d + (2 ** 6); print uv_fits_double(2251799813685249); # fits print uv_fits_double($fits_d); # fits print uv_fits_double($no_fits_d); # doesn't fit print uv_fits_double($fits_d + (2 ** 15)); # fits print "\n"; sub uv_fits_double { my $arg = shift; return 1 if $arg < 9007199254740993; while(!($arg & 1)) { $arg >>= 1; return 1 if $arg < 9007199254740993; } return 0; } __END__ Should output 1101 The 4th value fits, even though it's greater than the 3rd value (which doesn't fit).
Cheers,
Rob

Replies are listed 'Best First'.
Re: Detecting whether UV fits into an NV
by stevieb (Canon) on Feb 26, 2020 at 01:45 UTC
    "It annoys me that I can't find a way to detect and shift all of the trailing zero bits off in one hit - and that I instead have to detect and shift them off one at a time. "

    Depending what platform you're on, check if the __builtin_ctz() function is available to you. The ctz stands for 'count trailing zeros'. Use this to get the number of trailing zeros, then shift them all off at once.

    Update: I've added in bitCount(), which allows you to send in the number after stripping off the trailing zeros, and returns a count of the remaining bits.

    #include <stdio.h> #include <math.h> unsigned int bitCount(int num); void main () { /* Here's your example number, in binary format for visual purpose +s */ int num = 0b10100000; int noTrailingZeros = num >> __builtin_ctz(num); unsigned int bitsRemain = bitCount(noTrailingZeros); printf( "Zeros: %d, No Zeros: %d, Bits remain: %d\n", num, noTrailingZeros, bitsRemain ); } unsigned int bitCount(int num) { unsigned int bits = 0; while (num) { bits++; num >>= 1; } return bits; }

    Output:

    Zeros: 640, No Zeros: 5, Bits remain: 3
Re: Detecting whether UV fits into an NV
by dave_the_m (Monsignor) on Feb 26, 2020 at 09:27 UTC
    Why not just cast the UV value to an NV, cast back to an UV, and see if the first and final values differ?

    Dave

      Why not just cast the UV value to an NV, cast back to an UV, and see if the first and final values differ?

      I think that checking whether the UV arg == (UV)((NV)arg) is a splendid idea.
      Interestingly, it's only very slightly faster (at least for the value ranges I've tested) than the XSub I posted at the beginning of this thread, but it's far, far simpler.

      As an aside, whilst this approach is quite simple to implement inside XS space, is it even possible to do inside perl space ?
      That is, inside a perl sub, how would one coerce a UV to an NV and then back to a UV ?
      (This aspect is not an issue for me - which is the reason that I've presented it "as an aside". Just curious as to if/how it's possible, that's all.)

      Thanks Dave.

      Cheers,
      Rob
        In general perl goes to great trouble to use the type which will be as lossless as possible rather than the type you want, so it wouldn't be easy. Maybe messing about with pack/unpack?

        Dave.

        I think that checking whether the UV arg == (UV)((NV)arg) is a splendid idea

        Unfortunately this check fails for some UV values with less recent Microsoft Compilers - eg Visual Studio 2010 and earlier.
        No problems with Visual Studio 2017 and Visual Studio 2019.
        I suspect that Visual Studio 2015 is the oldest Microsoft Compiler that doesn't suffer from this problem, but I haven't confirmed that.

        This particular issue looks like another manifestation of the problem that sent me down this path in the first place.
        Here's the demo I ran:
        use strict; use warnings; use Inline C => Config => BUILD_NOISY => 1; use Inline C => <<'EOC'; int uv_fits_double1( UV arg ) { if(arg == (UV)((NV)arg)) return 1; return 0; } int uv_fits_double2(UV arg) { while(!(arg & 1)) arg >>= 1; if(arg < 9007199254740993) return 1; return 0; } EOC my $ls = 63; my @in = ( 18446744073709549568, 18446744073709139968); for(@in) { print "$_: ", uv_fits_double1($_),uv_fits_double2($_), "\n" +; } __END__ With Windows Server 2003 SP1 Platform SDK, this outputs: 18446744073709549568: 01 18446744073709139968: 01 Base 2 representations of the 2 Uvs is (resp): 1111111111111111111111111111111111111111111111111111100000000000 1111111111111111111111111111111111111111111110011011100000000000 showing that both are exactly representable by a double.


        Cheers,
        Rob
        my $x = ( 1 << 54 ) | 1; say $x; # 18014398509481985 $x = unpack 'F', pack 'F', $x; $x = unpack 'J', pack 'J', $x; say $x; # 18014398509481984

        To support negative numbers, use I instead of J for negative numbers.


        The following is a little bit truer to a cast, but it's far more complicated, far more fragile (can fail for overloaded and magical vars), and technically uses XS:

        use strict; use warnings; use feature qw( say ); use B qw( svref_2object ); my $x = ( 1 << 54 ) | 1; say $x; # 18014398509481985 # Add the value as an NV to the scalar. { no warnings qw( void ); log $x; } # Make it so the scalar contains just an NV. $x = svref_2object(\$x)->NV; # Add the value as an IV or UV to the scalar. { no warnings qw( void ); $x | 1; } # Make it so the scalar contains just an IV or UV. $x = svref_2object(\$x)->int_value; say $x; # 18014398509481984
Re: Detecting whether UV fits into an NV
by stevieb (Canon) on Feb 26, 2020 at 02:33 UTC

    Here's a hastily thrown together short program that does all the work. Removes trailing zeros, counts the number of bits remaining in the number, and informs you whether the number fits within 53 bits.

    #include <stdio.h> #include <stdint.h> #include <math.h> #define MAXLEN 53 unsigned int bitCount(unsigned long num); uint8_t doesItFit(unsigned long num); void main () { int num = 0b10100000; unsigned long bigNum = 18446744073709551615; // 2^64-1 printf( "Fit? Small: %d, Big: %d\n", doesItFit(num), doesItFit(bigNum) ); } unsigned int bitCount(unsigned long num) { unsigned int bits = 0; while (num) { bits++; num >>= 1; } return bits; } uint8_t doesItFit (unsigned long num) { num = num >> __builtin_ctz(num); unsigned int numBits = bitCount(num); return numBits <= MAXLEN ? 1 : 0; }

    Output:

    Fit? Small: 1, Big: 0
Re: Detecting whether UV fits into an NV
by GrandFather (Saint) on Feb 26, 2020 at 03:29 UTC

    Is this an "It annoys me because"

    1. it doesn't look elegant
    2. my application needs it to be performant and even a short loop isn't
    3. some other reason I'm going to tell you about

    If it is case 1 then I suspect you are out of luck. The only other ways I can think of doing it are even worse, or hardware dependent, or are a maintenance nightmare.

    For case 2 there are techniques like binary searches that maybe can run faster or a really nasty looking series of if statements that amount to unrolling the while loop.

    Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
      I'm just looking at the lack of elegance - that's pretty much it.
      I've no objection to inelegant code (of which I've written heaps) if it needs to be that way.
      If not for the portability concerns, I'd pick up on stevieb's suggestion and just have the XSub as :
      int uv_fits_double(UV arg) { arg >>= __builtin_ctzll(arg); if(arg < 9007199254740993) return 1; return 0; }
      That seems to work fine on MinGW-built Windows perls, where $Config{ivtype} is 'long long', and I'd be happy with that even if it wasn't any more efficient.
      I didn't know about the existence of __builtin_ctz() and friends. Thanks, stevieb for bringing it to my attention. I'll do some benchmarking later on, and see how much time it saves.
      It's an option that could come in handy to me in the future.

      I guess that, mainly, I just want to know if there are some more elegant (&& portable) options available.

      Cheers,
      Rob
Re: Detecting whether UV fits into an NV
by jcb (Parson) on Feb 27, 2020 at 03:10 UTC

    If you have RAM to spare, (you are using perl, of course you do) you could build a 2048-entry lookup table, mask arg with 2047, shift once with the value from the table and then compare with 9007199254740993 once.

    Something like: (untested)

    static uint8_t ctz_table_11bit[2048] = { /* elided */ }; int uv_fits_double(UV arg) { arg >>= ctz_table_11bit[arg & 2047]; return arg < 9007199254740993; }

    The relative elegance of this solution is probably a matter of opinion, but it should run much faster, since it is theoretically a single basic block and should be a candidate for inlining. This trades considerable computation for a single memory read, which will have relative performance depending on your hardware.

    Here is some Perl code to generate that table: (I think this is the first time I have used formats; output spot checked for plausibility)

    #!/usr/bin/perl use strict; use warnings; use constant BITS => 11; our @pos; our @bits; our @val; format STDOUT_TOP= . $= = 1<<(BITS-1); format STDOUT = /* @### -> @0######### */ @#, /* @### -> @0######### */ @#, $pos[0], $bits[0], $val[0], $pos[1], $bits[1], $val[1], . print "static uint8_t ctz_table_",BITS,"bit[",1<<BITS,"] = {\n"; for (my $pos = 0; $pos < (1<<BITS); $pos += 2) { @pos = ($pos, $pos+1); @bits = map substr(unpack('B*', pack 'n', $_), -(BITS)), @pos; @val = map length((m/(0*?)\z/)[0]), @bits; write } print "};\n"; # EOF
    Update: change two magic numbers to use symbolic constant.
      Thanks - that's certainly one way to implement ctz. (Nice !!)

      In general, according to my benchmarking, there's very little speed gain in using this method in an XSub as compared to using this:
      int uv_fits_double (UV arg) { while(!(arg & 1)) arg >>= 1; if(arg < 9007199254740993) return 1; return 0; }
      Obviously, for certain values there will be a significant gain, but for a random selection of values it works out at pretty much the same.

      For the record, I've based the above assertion on the output of the script that's presented below in the readmore section.

      Cheers,
      Rob

        I am not particularly surprised. The number in the table is also the number of times the loop will iterate in the simple implementation and literally half of the table entries are zero. Of the remaining half, half are one. Of that remaining half, half are two, and so forth. The cost of a memory read on modern hardware seems to be high enough that the lookup table does not give enough of an advantage. Perhaps the bit-twiddling solution roboticus offered will work better for you?

        Or is your testing script swamping the function under test with Perl-side type conversion and loop overhead? I suggest moving the iteration into C instead of walking a Perl array, since I suspect that both of the solutions in your test probably have running times dominated by the Perl for iterating over the array and the XS magic that converts an SV into a UV. Moving that iteration into C is probably necessary in any case: if you will be calling this predicate from Perl code, you need to know the overhead for that call itself to determine if optimizing uv_fits_double will actually provide any overall performance gain.

        Also note that uv_fits_double3 as presented here enters an infinite loop if given zero as an argument.

Re: Detecting whether UV fits into an NV
by roboticus (Chancellor) on Feb 27, 2020 at 14:31 UTC

    syphilis:

    I poked around with it and came up with something for you to figure it out without the looping.

    $ cat funky_bits.cc // funky_bits.cpp // // experiment with odd bit handling? // // 20200227 #include <stdio.h> // Some test cases unsigned long long foo[] = { 0xcab312, 0xdeadbeef, 0xdeadbeef0, 0xdeadbeef000, 0x138, 9007199254740992ULL, 18446744073709551615ULL }; // Needs a good name, and the type shenanigans should be cleaned up bool duzzit_fit(unsigned long long t) { // First we need to identify the lowest bit set long long neg_t = -(long long)t; long long last_set = t & neg_t; // Shift it left 52 bits to get location of lowest invalid bit // NOTE: If smallest invalid bit is far enough left, then this wil +l turn into 0 // making the invalid bits also 0, which happens to be OK! long long smallest_invalid = last_set << 52; // Convert it to a bit mask for the smallest invalid bit and those + to the left. unsigned long long valid_bits = smallest_invalid - 1; unsigned long long invalid_bits = ~valid_bits; // Uncomment for debugging //printf( " t:%16llx\n" // " last_set:%16llx\n" // " neg_t:%16llx\n" // "smallest_invalid:%16llx\n" // " valid_bits:%16llx\n" // " invalid_bits:%16llx\n", // t, last_set, neg_t, smallest_invalid, // valid_bits, invalid_bits); // It's a 'valid' number unless one of the upper-order invalid bit +s are set return ! (t & invalid_bits); } int main(int, char **) { for (int z=0; z<sizeof(foo)/sizeof(foo[0]); ++z) { unsigned long long i = foo[z]; bool b = duzzit_fit(i); printf("%lld: %s\n", i, b ? "OK" : "**NOPE**"); } return 0; } $ clang -o funky_bits funky_bits.cc $ ./funky_bits 13284114: OK 3735928559: OK 59774856944: OK 15302363377664: OK 312: OK 9007199254740992: OK -1: **NOPE**

    Cleaning it up, adding more test cases, further testing, debugging and simplification are all left as exercises for the reader... ;^D

    I was originally thinking of making a lookup table, but thought of this along the way.

    Update: fixed the abomination that was the first sentence.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

      I was originally thinking of making a lookup table, but thought of this along the way.

      Thanks for that - I still haven't worked my way through it properly.
      It seems to work fine now that I've changed:
      long long smallest_invalid = last_set << 52; to long long smallest_invalid = last_set << 53;
      Without that change, the following values (for example) wrongly reported "NOPE":
      1844674407366955264, 1844674407366955776, 1844674407366956288, 1844674407366956800, 1844674407366957312, 1844674407366957824, 1844674407366958336, 1844674407366958848, 1844674407366959360, 1844674407366959872, 1844674407366960384, 1844674407366960896, 1844674407366961408, 1844674407366961920, 1844674407366962432, 1844674407366962944, 1844674407366963456, 1844674407366963968, 1844674407366964480, 1844674407366964992
      In the following XS adaptation it's taking over twice as long as uv_fits_double3.
      use warnings; use Benchmark; use Inline C => Config => BUILD_NOISY => 1; use Inline C => <<'EOC'; int uv_fits_double3(SV * x, ...) { dXSARGS; int i, count = 0; UV arg; for(i = 0; i < items; i++) { arg = SvUV(ST(i)); if(arg) { while(!(arg & 1)) arg >>= 1; if(arg < 9007199254740993) count++; } } return count; } int uv_fits_double_bitfiddle(SV * t, ...) { dXSARGS; int i, count = 0; for(i = 0; i < items; i++) { /* First we need to identify the lowest bit set */ IV neg_t = -(SvIV(ST(i))); IV last_set = SvUV(ST(i)) & neg_t; /* Shift it left 53 bits to get location of lowest invalid bit + * * NOTE: If smallest invalid bit is far enough left, then this wil +l * * turn into 0 making the invalid bits also 0, which happens to be + OK! */ IV smallest_invalid = last_set << 53; UV valid_bits = smallest_invalid - 1; UV invalid_bits = ~valid_bits; /* It's a 'valid' number unless one of the upper-order invalid bit +s are set */ if(!(SvUV(ST(i)) & invalid_bits)) count++; } return count; } EOC @in = (1844674407366955161 .. 1844674407378955161); # @in = (9007199248740992 .. 9007199260740992); # @in = (184467436737095 .. 184467448737095); # @in = (184463440737 .. 184475440737); die "wrong size" unless @in == 12000001; timethese (1, { 'uv_fits_double3' => '$count1 = uv_fits_double3(@in);', 'uv_fits_double_bitfiddle' => '$count2 = uv_fits_double_bitfiddle(@in) +;', }); print "$count1 $count2\n"; die "Error in at least one of the XSubs" unless $count1 == $count2; __END__
      Benchmarking results on my Windows 7 machine with perl-5.30.0 are:
      Benchmark: timing 1 iterations of uv_fits_double3, uv_fits_double_bitf +iddle... uv_fits_double3: 0 wallclock secs ( 0.44 usr + 0.00 sys = 0.44 CPU) + @ 2.29/s (n=1) (warning: too few iterations for a reliable count) uv_fits_double_bitfiddle: 1 wallclock secs ( 0.94 usr + 0.00 sys = +0.94 CPU) @ 1.07/s (n=1) (warning: too few iterations for a reliable count) 12000001 12000001
      Can you see room for significant improvement in performance of uv_fits_double_bitfiddle ?

      Thanks for another interesting approach !!

      Cheers,
      Rob

        Rob:

        I'm not experienced in using Inline::C, but I thought I'd take a look at it. My first guess is that the overhead in SvUV(ST(i)) is causing twiddle to be slower, so I tweaked the routine to explicitly pull arg into a UV as you did in double3:

        $ diff pm_11113750_orig.pl pm_11113750_a.pl 26a27 > UV arg; 30,31c31,33 < IV neg_t = -(SvIV(ST(i))); < IV last_set = SvUV(ST(i)) & neg_t; --- > arg = SvUV(ST(i)); > IV neg_t = -arg; > IV last_set = arg & neg_t; 42c44 < if(!(SvUV(ST(i)) & invalid_bits)) count++; --- > if(!(arg & invalid_bits)) count++;

        Doing that change alone made the routines roughly the same speed:

        $ PERL5OPT= perl pm_11113750_orig.pl Benchmark: timing 5 iterations of uv_fits_double3, uv_fits_double_bitf +iddle... uv_fits_double3: 7 wallclock secs ( 6.70 usr + 0.00 sys = 6.70 CPU) + @ 0.75/s (n=5) uv_fits_double_bitfiddle: 17 wallclock secs (17.36 usr + 0.00 sys = 1 +7.36 CPU) @ 0.29/s (n=5) 46875 46875 $ PERL5OPT= perl pm_11113750_a.pl Benchmark: timing 5 iterations of uv_fits_double3, uv_fits_double_bitf +iddle... uv_fits_double3: 7 wallclock secs ( 6.75 usr + 0.00 sys = 6.75 CPU) + @ 0.74/s (n=5) uv_fits_double_bitfiddle: 6 wallclock secs ( 6.67 usr + 0.00 sys = +6.67 CPU) @ 0.75/s (n=5) 46875 46875

        Since the twiddle part doesn't have any loops, I would expect it to be significantly faster than a version *with* loops, depending on how clever the optimizer is. So another version I played with was to modify the routine to take two parameters and count the number of values that fit between those two values. That way, I could greatly reduce the impact of the SvUV() function at the cost of only being able to process sequences of numbers. That version confirmed my suspicion that the SvUV() function is slow enough that it swamps the difference in our algorithms.

        int uv_fits_double3x(SV* the_min, SV* the_max) { dXSARGS; UV i_min = SvUV(the_min); UV i_max = SvUV(the_max); UV i; int count = 0; UV arg; for(i = i_min; i < i_max; i++) { arg = i; if(arg) { while(!(arg & 1)) arg >>= 1; if(arg < 9007199254740993) count++; } } return count; } int uv_fits_double_bitfiddle_3x(SV* the_min, SV* the_max) { dXSARGS; UV i_min = SvUV(the_min); UV i_max = SvUV(the_max); UV i; int count = 0; for(i = i_min; i < i_max; i++) { UV arg = i; IV neg = -arg; IV smallest_invalid = (arg & -arg)<<53; UV valid_bits = smallest_invalid-1; UV invalid_bits = ~valid_bits; if (! (arg & invalid_bits)) count++; } return count; }

        Both routines are much faster if they only take two arguments and iterate over the entire range. Only then does the twiddle version show a significant speed advantage at about twice as fast. The double3x version is about 1800 times faster, while fiddle3x is about 2900 times faster:

        $ PERL5OPT= perl pm_11113750_x.pl Name "main::count4" used only once: possible typo at pm_11113750_x.pl +line 147. === Test 1: roughly equivalent speed Benchmark: timing 5 iterations of uv_fits_double3, uv_fits_double_bitf +iddle... uv_fits_double3: 28 wallclock secs (27.22 usr + 0.08 sys = 27.30 CPU) + @ 0.18/s (n=5) uv_fits_double_bitfiddle: 27 wallclock secs (26.72 usr + 0.00 sys = 2 +6.72 CPU) @ 0.19/s (n=5) 33046878 33046878 (48000004) === TEST 2: twiddle has an advantage if we don't use SvUV() Benchmark: timing 10000 iterations of bitfiddle_3x, double3x, empty_fu +nc, empty_loop... bitfiddle_3x: 18 wallclock secs (18.39 usr + 0.00 sys = 18.39 CPU) @ +543.74/s (n=10000) double3x: 30 wallclock secs (29.55 usr + 0.00 sys = 29.55 CPU) @ 33 +8.44/s (n=10000) empty_func: 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU) (warning: too few iterations for a reliable count) empty_loop: 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU) (warning: too few iterations for a reliable count) 33046875 33046875 48000000 1416858175

        Unfortunately, I've not been successful in making a useful version that's faster in a real-world scenario. My current understanding of things (after playing with it for a few hours) is that the single argument versions get swamped out by the looping and function calling overhead. The functions are just too simple to be aHowever, I've not been successful in making a useful version that's significantly faster (with call overhead and such).

        NOTES: As you can tell from the above, I've not been successful in measuring the function call (empty func) or empty loop versions of the functions, so my benchmarking could be broken, so I'm including the current code, so people can point things out.

        ...roboticus

        When your only tool is a hammer, all problems look like your thumb.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11113419]
Approved by GrandFather
Front-paged by haukex
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (4)
As of 2024-04-23 22:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found