Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re^2: Bidirectional lookup algorithm? (Judy)

by tye (Sage)
on Jan 11, 2015 at 22:08 UTC ( [id://1112910]=note: print w/replies, xml ) Need Help??


in reply to Re: Bidirectional lookup algorithm? (Solution.)
in thread Bidirectional lookup algorithm? (Updated: further info.)

It seems possible that Judy arrays would reduce the space requirement further and (possibly) increase the performance. Just a shame that the Build mechanism is broken.

Since Judy seemed the most promising approach in my personal opinion, I did take a quick look at the complexities of getting the Perl module working under Windows. There seems to be at least 3 ways to build the Judy library itself under Windows:

  1. Cross-compile it on Unix.
  2. Configure it by hand and install GNU Make (and maybe gcc) to build it.
  3. Use the included *.bat file and build it using Visual Studio.

The last choice seems, by far, the easiest. The code included in Alien::Judy even includes the *.bat file. But since I didn't already have Visual Studio, I stopped looking at that point.

I didn't reply as what I discovered was found easily so I guessed you'd either find the same thing soon enough or wouldn't be that interested in using Judy.

Your node above (thanks for sharing your implementation) makes me think you may still be interested in Judy and also didn't yet find the *.bat file. So that little bit of information may encourage you and/or just save you some small bit of research.

The next steps (getting Judy to not fail to build just because Alien::Judy isn't installed and getting the build process for Judy to find the needed include and library files) seem to me to likely be fairly simple.

I may still eventually get around to jumping through those hoops (though, to be honest, there are a lot of higher-priority items on my to-do list right now). If I do, I'll post more details. If you do, please also post what you find.

Thanks.

- tye        

Replies are listed 'Best First'.
Re^3: Bidirectional lookup algorithm? (Judy)
by BrowserUk (Patriarch) on Jan 14, 2015 at 22:02 UTC

    I finally managed to build a 64-bit version of Judy.

    I started with this one-file/1250 line version and hacked out all the -DSTANDALONE and -DASKITIS stuff along with all the BIG_ENDIAN stuff; extracted a Judy.h; and got the filesize down to 965 lines and built it into a dll:

    C:\test\Judy>cl /W3 /Ot /favor:INTEL64 /MT /LD Judy.c Microsoft (R) C/C++ Optimizing Compiler Version 15.00.21022.08 for x64 Copyright (C) Microsoft Corporation. All rights reserved. Judy.c Microsoft (R) Incremental Linker Version 9.00.21022.08 Copyright (C) Microsoft Corporation. All rights reserved. /out:Judy.dll /dll /implib:Judy.lib Judy.obj Creating library Judy.lib and object Judy.exp

    I then wrote a C program to us it to create two Judy arrays and stored my test data 'aaaaa'..'zzzzz' paired with a 64-bit integer:

    built it against the dll:

    C:\test\Judy>cl /W3 /Ot JudyTest.c Judy.lib Microsoft (R) C/C++ Optimizing Compiler Version 15.00.21022.08 for x64 Copyright (C) Microsoft Corporation. All rights reserved. JudyTest.c Microsoft (R) Incremental Linker Version 9.00.21022.08 Copyright (C) Microsoft Corporation. All rights reserved. /out:JudyTest.exe JudyTest.obj Judy.lib

    A run:

    C:\test\Judy>JudyTest.exe aaaaa Check memory: Bidi lookup of 11881376 pairs took: 6.325 seconds Check memory: 524,332k

    Then I built it as an Inline::C module, adding method wrappers for the important functions:

    Unfortunately, in this form, the runtime increase -- mostly I think due to the perl->C->perl transitions -- from 6.5 seconds to over 25s:

    C:\test\Judy>perl Judy.pm -ODO=aaaaa Memory before building Judy: 10,760 K Memory after building Judy: 347,660 K Bidi lookups of 11881376 pairs took:25.197204113 seconds Memory after lookups: 347,680 K

    So, whilst it does use somewhat less memory than my BiMap version; is also somewhat slower.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
      I finally managed to build a 64-bit version of Judy

      For the record, I finally managed to build a static 64-bit library, too. (Credit to BrowserUk and oiskuu.)
      oiskuu suggested that my problems with non-constant expressions arose because the compiler chose not to fold constant expression where undefined behaviour is involved - and that I should use (Word_t)0x100 instead.
      Doing that allowed the build to proceed, and 'make check' passed its tests.

      However, I was still seeing a number of "left shift count >= width of type" warnings, and I don't know how thorough the test suite (which completes very quickly) is.
      I therefore don't have a lot of confidence that the library is going to behave reliably in the real world.

      With the Judy-0.41 patches that anonymonk posted I was able to successfully build the module on 32-bit perls.
      However, the Judy module that I built against the 64-bit library loaded ok, but kept crashing during the running of its test suite.
      In Judy.xs, I changed the hard coded "long" casts to "long long" casts. That got rid of lots of warnings during the build, but the crashing persisted.
      The generated Judy.c file still contained some casts to "unsigned long int" , but I couldn't immediately find what was generating those casts - at which point I totally lost interest.

      Cheers,
      Rob
        For the record, I finally managed to build a static 64-bit library, too.

        Well done. And thank you for trying. Was that with Mingw or MSVC?

        I never managed to get beyond the Too early to specify a build action 'installdeps'. Do 'Build installdeps' instead. idiocy using MSVC.

        However, the Judy module that I built against the 64-bit library loaded ok, but kept crashing during the running of its test suite. In Judy.xs, I changed the hard coded "long" casts to "long long" casts. That got rid of lots of warnings during the build, but the crashing persisted. The generated Judy.c file still contained some casts to "unsigned long int" , but I couldn't immediately find what was generating those casts - at which point I totally lost interest.

        It's very capable code, but the very epitome of 'macro abuse'. The only way to work out what code is actually being compiled is to use /E and wade through the post processed source, but by then, you've no idea where the code came from; or what combination of #defines, #ifdefs and #ifndefs caused it.

        Like trying to listen to music by looking at an oscilloscope trace.

        That's why when I came across the 1 file version I linked, I jumped at it. It at least gave me a way to get a quick feel for the performance and memory usage -- both of which are very impressive -- until you have to make the Perl->XS->C and back transitions for every lookup :(

        Its not just the extra layers of function calls -- which I've tried to negate by forcing the functions to be inlined -- it also comes down to the fact that Perl code messes up the code and data caches as it leaps about all over memory chasing opcode trees. As the basis of Judy Array performance is the minimisation of cache misses, screwing with the cache between each access to those arrays, pretty much undoes everything the author worked so hard to achieve.

        Also, I think that my need to use two Judy arrays in parallel does nothing to enhance the performance.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
Re^3: Bidirectional lookup algorithm? (Judy)
by Anonymous Monk on Jan 12, 2015 at 00:07 UTC
      C:\citrusperl\mingw\msys\bin\sh.EXE configure --enable-static --disable-shared prefix=C:/dev/judyjudy/judyprefix
      C:\citrusperl\mingw\msys\bin\make.EXE install


      Have you built a 64-bit Windows build of Judy-1.0.5 ?
      I had no trouble with 32-bit Windows builds, but with the 64-bit build I get lots of the following warnings:
      warning: right shift count >= width of type [enabled by default]
      and
      warning: left shift count >= width of type [enabled by default]
      Pretty early on, make reports the following errors:
      JudyLInsArray.c:137:5: error: initializer element is not constant JudyLInsArray.c:137:5: error: (near initialization for 'subexp_mask[5] +') JudyLInsArray.c:138:5: warning: left shift count >= width of type [ena +bled by default] ~cJU_POP0MASK(6), ^ JudyLInsArray.c:138:5: error: initializer element is not constant JudyLInsArray.c:138:5: error: (near initialization for 'subexp_mask[6] +') JudyLInsArray.c:139:5: warning: left shift count >= width of type [ena +bled by default] ~cJU_POP0MASK(7), ^ JudyLInsArray.c:139:5: error: initializer element is not constant JudyLInsArray.c:139:5: error: (near initialization for 'subexp_mask[7] +')
      which, of course, soon leads to:
      make[3]: *** [JudyLInsArray.lo] Error 1 make[3]: Leaving directory `/c/_64/comp/judy-1.0.5/src/JudyL' make[2]: *** [all-recursive] Error 1 make[2]: Leaving directory `/c/_64/comp/judy-1.0.5/src' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/c/_64/comp/judy-1.0.5' make: *** [all] Error 2
      I had a quick but unsuccessful attempt to google up something about these errors.
      Do you happen to know what the fix is ?

      Cheers,
      Rob

        From the Judy Arrays web page:

        The Judy family of functions supports fully dynamic arrays. These arrays may be indexed by a 32- or 64-bit word (depending on processor word size), a null terminated string or an array-of-bytes plus length. A dynamic array (sparsely populated) can also be thought of as a mapping function or associative memory.

        A Word_t is a typedef unsigned long int in Judy.h and must be the same size as sizeof(void *) I.E. a pointer.

        unsigned long int is a 4-byte integer on 64-bit Windows.

        This seems to be the 64-bit data model difference between Windows (LLP64) and *nix (LP64) raising its ugly head.

        Word_t should be typedef'd as long long in order to ensure that sizeof(Word_t) == sizeof(void*) on both platforms.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
      ... ought to work on strawberry provided you have msys ...

      Thanks for your efforts.

      Unfortunately, getting a completely different version and flavour of perl + mingw + msys into my customer's set up is a fight I just don't need.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
        FWIW these are the memory savings as reported by pslist
        it begins 0.734375 WVM: 21976 { WS: 5948 VM: 3164 } after perl add 2.078125 WVM: 157152 { WS: 120132 VM: 117276 } after perl get 0.75 WVM: 157152 { WS: 120140 VM: 117276 } after free perl 1.5 WVM: 153184 { WS: 74936 VM: 72072 } it begins 0.4375 WVM: 21976 { WS: 5948 VM: 3164 } after jAdd 3.734375 WVM: 50648 { WS: 35500 VM: 32644 } after jGet 3.84375 WVM: 50648 { WS: 35512 VM: 32644 } after jFree 1.171875 WVM: 50716 { WS: 7988 VM: 5092 }

        And the Judy code

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1112910]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (6)
As of 2024-03-29 12:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found