Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Re^3: Detecting whether UV fits into an NV

by jcb (Priest)
on Mar 03, 2020 at 00:31 UTC ( #11113676=note: print w/replies, xml ) Need Help??

in reply to Re^2: Detecting whether UV fits into an NV
in thread Detecting whether UV fits into an NV

I am not particularly surprised. The number in the table is also the number of times the loop will iterate in the simple implementation and literally half of the table entries are zero. Of the remaining half, half are one. Of that remaining half, half are two, and so forth. The cost of a memory read on modern hardware seems to be high enough that the lookup table does not give enough of an advantage. Perhaps the bit-twiddling solution roboticus offered will work better for you?

Or is your testing script swamping the function under test with Perl-side type conversion and loop overhead? I suggest moving the iteration into C instead of walking a Perl array, since I suspect that both of the solutions in your test probably have running times dominated by the Perl for iterating over the array and the XS magic that converts an SV into a UV. Moving that iteration into C is probably necessary in any case: if you will be calling this predicate from Perl code, you need to know the overhead for that call itself to determine if optimizing uv_fits_double will actually provide any overall performance gain.

Also note that uv_fits_double3 as presented here enters an infinite loop if given zero as an argument.

Replies are listed 'Best First'.
Re^4: Detecting whether UV fits into an NV
by syphilis (Bishop) on Mar 04, 2020 at 04:07 UTC
    Moving the for loop into C space does enable the improvement in speed (about 12%) offered by uv_fits_double_ctz to become visible.

    On my Windows 7 machine, perl-5.30.0, that improvement equates to the saving of 0.05 seconds in the processing of 12 million integer values.
    That makes it somewhat less compelling, but a saving is a saving, and I would therefore be in favour of that approach, as opposed to the approach that I took.
    I found that cutting the table size in half (by removing all zero entries) didn't measurably slow down uv_fits_double_ctz, in which we simply change the condition:
    arg >>= ctz_table_11bit[arg & 2047]; to if(!(arg & 1)) arg >>= ctz_table_11bit[(arg & 2047) / 2];
    I think I'd prefer using that reduced table (simply because it's half the size), though the altered line of code is perhaps not so readily intelligible.
    The table could of course be further reduced in size, but I found that doing so started to impact on the performance.

    Also note that uv_fits_double3 as presented here enters an infinite loop if given zero as an argument

    Probably worth keeping in mind ;-)

    I'll now take a look at the contribution provided by roboticus, as you've suggested.
    I haven't had a chance to get to it earlier.

    Thanks for the ideas and corrections !!


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://11113676]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (7)
As of 2020-05-29 08:32 GMT
Find Nodes?
    Voting Booth?
    If programming languages were movie genres, Perl would be:

    Results (168 votes). Check out past polls.