http://www.perlmonks.org?node_id=11113469


in reply to Detecting whether UV fits into an NV

If you have RAM to spare, (you are using perl, of course you do) you could build a 2048-entry lookup table, mask arg with 2047, shift once with the value from the table and then compare with 9007199254740993 once.

Something like: (untested)

static uint8_t ctz_table_11bit[2048] = { /* elided */ }; int uv_fits_double(UV arg) { arg >>= ctz_table_11bit[arg & 2047]; return arg < 9007199254740993; }

The relative elegance of this solution is probably a matter of opinion, but it should run much faster, since it is theoretically a single basic block and should be a candidate for inlining. This trades considerable computation for a single memory read, which will have relative performance depending on your hardware.

Here is some Perl code to generate that table: (I think this is the first time I have used formats; output spot checked for plausibility)

#!/usr/bin/perl use strict; use warnings; use constant BITS => 11; our @pos; our @bits; our @val; format STDOUT_TOP= . $= = 1<<(BITS-1); format STDOUT = /* @### -> @0######### */ @#, /* @### -> @0######### */ @#, $pos[0], $bits[0], $val[0], $pos[1], $bits[1], $val[1], . print "static uint8_t ctz_table_",BITS,"bit[",1<<BITS,"] = {\n"; for (my $pos = 0; $pos < (1<<BITS); $pos += 2) { @pos = ($pos, $pos+1); @bits = map substr(unpack('B*', pack 'n', $_), -(BITS)), @pos; @val = map length((m/(0*?)\z/)[0]), @bits; write } print "};\n"; # EOF
Update: change two magic numbers to use symbolic constant.

Replies are listed 'Best First'.
Re^2: Detecting whether UV fits into an NV
by syphilis (Archbishop) on Mar 02, 2020 at 08:07 UTC
    Thanks - that's certainly one way to implement ctz. (Nice !!)

    In general, according to my benchmarking, there's very little speed gain in using this method in an XSub as compared to using this:
    int uv_fits_double (UV arg) { while(!(arg & 1)) arg >>= 1; if(arg < 9007199254740993) return 1; return 0; }
    Obviously, for certain values there will be a significant gain, but for a random selection of values it works out at pretty much the same.

    For the record, I've based the above assertion on the output of the script that's presented below in the readmore section.

    Cheers,
    Rob

      I am not particularly surprised. The number in the table is also the number of times the loop will iterate in the simple implementation and literally half of the table entries are zero. Of the remaining half, half are one. Of that remaining half, half are two, and so forth. The cost of a memory read on modern hardware seems to be high enough that the lookup table does not give enough of an advantage. Perhaps the bit-twiddling solution roboticus offered will work better for you?

      Or is your testing script swamping the function under test with Perl-side type conversion and loop overhead? I suggest moving the iteration into C instead of walking a Perl array, since I suspect that both of the solutions in your test probably have running times dominated by the Perl for iterating over the array and the XS magic that converts an SV into a UV. Moving that iteration into C is probably necessary in any case: if you will be calling this predicate from Perl code, you need to know the overhead for that call itself to determine if optimizing uv_fits_double will actually provide any overall performance gain.

      Also note that uv_fits_double3 as presented here enters an infinite loop if given zero as an argument.

        Moving the for loop into C space does enable the improvement in speed (about 12%) offered by uv_fits_double_ctz to become visible.

        On my Windows 7 machine, perl-5.30.0, that improvement equates to the saving of 0.05 seconds in the processing of 12 million integer values.
        That makes it somewhat less compelling, but a saving is a saving, and I would therefore be in favour of that approach, as opposed to the approach that I took.
        I found that cutting the table size in half (by removing all zero entries) didn't measurably slow down uv_fits_double_ctz, in which we simply change the condition:
        arg >>= ctz_table_11bit[arg & 2047]; to if(!(arg & 1)) arg >>= ctz_table_11bit[(arg & 2047) / 2];
        I think I'd prefer using that reduced table (simply because it's half the size), though the altered line of code is perhaps not so readily intelligible.
        The table could of course be further reduced in size, but I found that doing so started to impact on the performance.

        Also note that uv_fits_double3 as presented here enters an infinite loop if given zero as an argument

        Probably worth keeping in mind ;-)

        I'll now take a look at the contribution provided by roboticus, as you've suggested.
        I haven't had a chance to get to it earlier.

        Thanks for the ideas and corrections !!

        Cheers,
        Rob