Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Perl XS portable uint32_t

by tachyon-II (Chaplain)
on Jun 06, 2008 at 11:27 UTC ( #690656=perlquestion: print w/replies, xml ) Need Help??

tachyon-II has asked for the wisdom of the Perl Monks concerning the following question:

Last time I struggled with 64 bit perls and this it was in the context of getting uint32 behaviour from pure perl in order to get Math::Random::MT::Perl working. With a little help from a generous monk this was duly solved with an & 0xffffffff to constrain the 64 bits to 32.

In the course of updating Digest::JHash (a fast 32 bit hashing algorithm) I have now struck the same problem, but this time from the C/XS side. The issue is that once you start hashing using left bitshifts on unexpectely 64 bit wide integers you blow out from 32 bits into the upper 32 bits which results in the behaviour commonly called "does not work (TM)" The problem per se is that I need a *reliable* *portable* uint32_t. Now <stdint.h>, <inttypes.h> and <sys/types.h> all define your basic uint32_t BUT using inconsistent names and with no guarantee of only 32 bits. I spent some time working with the author of Math::Random::MT trying to get a portable way of declaring a uint32_t that you can then be assured is exactly 32 bits - no more, no less. We were unable to find a really portable solution. <stdint.h> is not even included with VCC on MSWin32 - but its only be part of the standard since C99 ;-)

I struck me that I might be missing something obvious. The common thread across perls on many systems is of course perl. If for some reason there was a uint32 typdef lodged in the guts of perl I could just use that. The U32 type seems to fit the bill but will this be portable to 64 bit perls? From my reading the only guarantee is that it will be at least 32 bits wide. I really want exactly 32 bits.

Does anyone have a good solution to this problem?

If there is not a solid 32bit type then is was contemplating a macro that is a noop on 32 bit systems, or does & 0xfffffff on 64 bit systems. What is the best way of detecting a 64 bit perl, or more particularly an accidentally 64 bit wide uint32 in the context of a header if/else.

Replies are listed 'Best First'.
Re: Perl XS portable uint32_t
by brian_d_foy (Abbot) on Jun 06, 2008 at 15:02 UTC

    Most of the pain of taking over the maintenance of Crypt::Rijndael was solving this same problem. In that distro, see the rijndael.h file to see what I've done. Basically, I test for each operating system, architecture, or compiler and do the special thing for it to define UINT8 and UINT32. A lot of people have helped by sending in the magical #defines and right header files for their system. Also google my Crypt::Rijndael posts on use.perl for threads asking the same question.

    I now have several virtual machines set up so I can test on the problem systems, mostly Solaris and various Windows environments. That helped a lot since I didn't have to guess if something might work and send it off. I could paly with the system. I also used HP's Test Drive stuff to check VMS. Too bad Sourceforge turned off their compile farm, which is the only reason I was still using them. ;(

    Good Luck :)

    --
    brian d foy <brian@stonehenge.com>
    Subscribe to The Perl Review
Re: Perl XS portable uint32_t
by salva (Canon) on Jun 06, 2008 at 14:13 UTC
    along the perl.h header file installed on you system, you will find config.h (for instance, in my Debian box it is at /usr/lib/perl/5.10.0/CORE/config.h) containing all the information captured by Configure when perl was built.

    Specifically it has information about the size of most common C integer types. For instance:

    #define INTSIZE 4 /**/ #define LONGSIZE 4 /**/ #define SHORTSIZE 2 /**/
    BTW, the same information is available on the Perl side using the Config module.

    You could also explicitly discard the upper bits from a possible 64bits word using...

    v &= 0xffffffff;
    and hope that the optimizer removes the superfluous instructions from the final code on 32 bit architectures
Re: Perl XS portable uint32_t
by almut (Canon) on Jun 06, 2008 at 19:45 UTC
    If there is not a solid 32bit type then I was contemplating a macro that is a noop on 32 bit systems, or does & 0xfffffff on 64 bit systems.

    I think simply doing & 0xffffffff on 64-bit systems (whenever a value larger than 0xffffffff could be the result of an individual operation) is not such a bad idea... in case unsigned long happens to be wider than 32-bit (which can be determined easily, either at build- or at runtime). At least I would guess it's less headaches getting it to work portably, than trying to make provisions to always use the appropriate int type, where the compiler is making sure that exactly 32-bit are being used.

    The additional and-operations (53 of which are needed here) are typically very fast, so the associated performance penalty on 64-bit platforms is likely going to be acceptable.  In fact, I just benchmarked it. With the test routines computing the jhash for 100 random strings of length 100000, I get on average:

    Rate u32 u64 u32 54.9/s -- -13% u64 63.3/s 15% --

    where 'u32' is the version with the additional & 0xffffffff instructions, producing correct results, while 'u64' is the original version producing incorrect results with 64-bit ints.

    (Perl v5.8.8 built for x86_64-linux-thread-multi; gcc 4.1.0)

    With a string length of only 100 chars, the difference reduces to about 4%, because the function calling overhead becomes larger, relatively... The ~14% is close to the 'true' slow down attributable to the additional and-operations — in other words, it's roughly the asymptotic limit, no longer increasing significantly with greater string lengths.

      Hello again almut. I would rather not just compile in the & MASK32 if it is not needed. It is easy enough to set a define in the Makefile.PL if use64bitint is set:

      DEFINE => ($Config{use64bitint} ? '-DUSING_64_BIT_INT' : ''),

      Then in the C you can use this to provide 2 versions of the MIX macro. I may well be wrong (can't test) but it seems to me that none of the & MASK32 are required in the main jhash code, provided you use a macro setup as shown below. The a b c ints will probably overflow into the low bits of the 64 bit space but because all the operations are addition the low order bits will be the valid representation in 32 bit space. Could you give this a try:

      /* Need to constrain U32 to only 32 bits on 64 bit systems * For efficiency we only use the & 0xffffffff if required */ #define USING_64_BIT_INT /* save messing with Makefile.PL define */ #if defined(USING_64_BIT_INT) #define MIX(a,b,c) \ { \ a &= 0xffffffff; b &= 0xffffffff; c &= 0xffffffff; \ a -= b; a -= c; a ^= (c>>13); a &= 0xffffffff; \ b -= c; b -= a; b ^= (a<<8); b &= 0xffffffff; \ c -= a; c -= b; c ^= (b>>13); c &= 0xffffffff; \ a -= b; a -= c; a ^= (c>>12); a &= 0xffffffff; \ b -= c; b -= a; b ^= (a<<16); b &= 0xffffffff; \ c -= a; c -= b; c ^= (b>>5); c &= 0xffffffff; \ a -= b; a -= c; a ^= (c>>3); a &= 0xffffffff; \ b -= c; b -= a; b ^= (a<<10); b &= 0xffffffff; \ c -= a; c -= b; c ^= (b>>15); c &= 0xffffffff; \ } #else #define MIX(a,b,c) \ { \ a -= b; a -= c; a ^= (c>>13); \ b -= c; b -= a; b ^= (a<<8); \ c -= a; c -= b; c ^= (b>>13); \ a -= b; a -= c; a ^= (c>>12); \ b -= c; b -= a; b ^= (a<<16); \ c -= a; c -= b; c ^= (b>>5); \ a -= b; a -= c; a ^= (c>>3); \ b -= c; b -= a; b ^= (a<<10); \ c -= a; c -= b; c ^= (b>>15); \ } #endif /* rest of code unchanged with no masking */
        I would rather not just compile in the & MASK32 if it is not needed.

        That's not what I was trying to say :)   Rather, I meant to imply two things: (a) it might be easier to just do the 32-bit masking yourself, in case you can't easily figure out what the respective magic non-standard __I32_ulong__t incantation is to get the proper 32-bit type on that yet unknown weird 64-bit platform/compiler combo, (b) the performance penalty of doing so is not as big as one might expect.

        But you're right, the amount of masking statements actually required may be simplified (I must admit, I hadn't put too much thought into the details here). In particular, masking a, b, c centrally at the beginning of the macro, saves you from having to do so at the end of various other code paths that you do come along...  Interestingly though, there's not much gain in performance from the reduced masking — I now get on average:

        Rate u32 u64 u32 56.2/s -- -12% u64 63.7/s 13% --

        (under otherwise identical conditions)

        Anyhow, I tested your simplified code suggestion, and it's working fine (at least on x86_64-linux).

        Thanks for adding another useful module to CPAN!

        It is easy enough to set a define in the Makefile.PL if use64bitint is set

        Just be a little cautious with $Config{use64bitint}. It doesn't always tell you what you want/need to know. On 32-bit systems where perl is built with -Duse64bitint, the 'long' and 'int' sizes can (and generally do, I believe) remain at 4 bytes.

        I think I've also seen perls built with -Dusemorebits (the equivalent of building with -Duse64bitint && -Duselongdouble) that have neither use64bitint nor uselongdouble defined.

        And finally, it would be possible to have 64-bit longs and ints in play without having built with use64bitint support (ie when 64 bits is the size of the long/int on the particular compiler being used).

        There are probably other aspects to consider as well. (See the INSTALL file that ships with the perl source for a more authoritative account.)

        Cheers,
        Rob
Re: Perl XS portable uint32_t
by syphilis (Archbishop) on Jun 06, 2008 at 13:10 UTC
    As regards Digest::JHash, isn't it just a matter of determining the size of unsigned long and unsigned int (preferably during pre-processing), and then proceeding accordingly ?

    That is, I'm suggesting that the questions you asked (which I don't feel competent to answer, btw) are irrelevant to Digest::JHash. All it really needs to know are the sizes of unsigned longs and unsigned ints - and sizeof() can provide that answer.

    Cheers,
    Rob
    Update: Aaah ... but sizeof() can't provide that info during pre-processing - which therefore adds a layer of complexity.

      Hi Rob,

      Unfortunately most hashing behaviour depends on the idiosyncrcies of overflow wrap and truncation implicit in an (32 bit) int. Consider this case (using a 4 bit "int" on a 4 and 8 bit machine)

      1111 << 1 = 1110 (4 bit) 1111 << 1 = 00011110 (8 bit) Now consider what happens if we then go on to perform a rightshift 1110 >> 2 = 0011 (4 bit) 00011110 >> 2 = 00000111 (8 bit) ^ Oops
      So after two identical operations the results on 4 vs 8 bit architecture now differ. Essentially by having the spare high order bits we do not lose those bits to the big bit bucket in the sky, so when we right shift they reappear. As a result any algorithm that uses much bitshifting will not work as desired if the int being used is not exacty the design width.

      Unfortunately you can't use sizeof in a preprocessor directive to do the setup one way on a 32 bit machine and another way on a 64 bit one.

      Cheers

      tachyon

        The example you provided doesn't match what I'm finding with my 64-bit Microsoft Platform SDK for Windows Server 2003 R2 compiler on Vista 64. This compiler has 32-bit longs and ints - yet those high order bits are, I think, being lost to the "big bit bucket in the sky":
        use warnings; use Inline C => Config => BUILD_NOISY => 1; use Inline C => <<'EOC'; void foo() { unsigned long x = 0xffffffff; printf("long: %d\nint: %d\n", sizeof(long), sizeof(int)); printf("%x\n", x); x <<= 2; printf("%x\n", x); x >>= 2; printf("%x\n", x); } EOC foo(); __END__ Outputs: long: 4 int: 4 ffffffff fffffffc 3fffffff
        Maybe this behaviour is not reliable across the full range of compilers/systems/architectures. (I honestly wouldn't know.)

        Unfortunately you can't use sizeof in a preprocessor directive to do the setup one way on a 32 bit machine and another way on a 64 bit one

        Yes - I was finding that out for myself (probably as you were writing your reply :-)

        However, you can have the Makefile.PL query $Config{intsize} and $Config{longsize}. And the Makefile.PL can then define symbols (based on those config values) that the pre-processor can make use of.

        Cheers,
        Rob
Re: Perl XS portable uint32_t
by chrstphrchvz (Scribe) on Jun 24, 2018 at 09:17 UTC

    This is extremely old, but I don't believe anyone set the record straight regarding the false premise that a uint32_t does not guarantee exactly 32-bits.

    A uint32_t is guaranteed to be exactly 32-bits. If an architecture has no way of meeting that requirement, then a compiler must not make uint32_t available; it would be in violation of C99 if it did.

    Just thought this needed to be said. Carry onů

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://690656]
Approved by citromatik
Front-paged by brian_d_foy
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2022-12-01 07:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?