Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

need help with Tie::Judy

by expo1967 (Sexton)
on Dec 05, 2020 at 19:05 UTC ( [id://11124721]=perlquestion: print w/replies, xml ) Need Help??

expo1967 has asked for the wisdom of the Perl Monks concerning the following question:

at the office I have been given the task of becominf familiar with some software that uses "Judy Arrays" by using the "Tie::Judy" module. so far I have not been able to find any examples on how to use Tie::Judy. Can anyone point me to some simple examples ? Thanks.

Replies are listed 'Best First'.
Re: need help with Tie::Judy
by syphilis (Archbishop) on Dec 05, 2020 at 23:23 UTC
    Can anyone point me to some simple examples ?

    There are some examples to be found in the Tie-Judy test suite.
    (The test suite consists only of one file.)

    Cheers,
    Rob
Re: need help with Tie::Judy
by stevieb (Canon) on Dec 05, 2020 at 19:10 UTC

    Although the documentation for Tie::Judy seems quite lacking (ie. there is none), the SYNOPSIS appears enough to get someone at least started. Have you looked there?

    Perhaps give it a try, and if you have problems, come back here with specific questions.

      Yes I have looked at the documentation for Tie::Judy and the example code is not complete. I need a complete functioning example.
        > I need a complete functioning example.

        try this

        use Tie::Judy; tie %fruit_color, 'Tie::Judy';
        A hash represents a set of key/value pairs:
        my %fruit_color = ("apple", "red", "banana", "yellow");
        You can use whitespace and the '=>' operator to lay them out more nicely:
        my %fruit_color = ( apple => "red", banana => "yellow", );
        To get at hash elements:
        $fruit_color{"apple"}; # gives "red"
        You can get at lists of keys and values with keys() and values().
        my @fruits = keys %fruit_color; my @colors = values %fruit_color;

        see also Re: need help with Tie::Judy

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

Re: need help with Tie::Judy
by LanX (Saint) on Dec 05, 2020 at 19:41 UTC
    I've never heard of Judy Arrays before, but Tie::Judy provides a tie interface (sic), which means you can use it like any other Perl hash after tieing it.

    use Tie::Judy; tie %judy, 'Tie::Judy';

    If that's a problem, please look into perlintro#Hashes and maybe perldata on how to use Perl hashes. This should get you started.

    All the other methods listed in the Synopsis seem to be for performance boosts with bulk operations.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      I've never heard of Judy Arrays before

      Same for me. But from a first look at the Wikipedia article, it seems to me that Judy arrays offer a significant gain over hash tables only for some edge cases. Quoting Wikipedia:

      Judy arrays are designed to minimize the number of expensive cache-line fills from RAM, and so the algorithm contains much complex logic to avoid cache misses as often as possible. Due to these cache optimizations, Judy arrays are fast, especially for very large datasets. On data sets that are sequential or nearly sequential, Judy arrays can even outperform hash tables, since, unlike hash tables, the internal tree structure of Judy arrays maintains the ordering of the keys.

      And further:

      Judy arrays are extremely complicated. The smallest implementations are thousands of lines of code. In addition, Judy arrays are optimized for machines with 64 byte cache lines, making them essentially unportable without a significant rewrite. In most applications the possible performance advantage is too small to justify the high complexity of the data structure implementation.

      So, for me, it looks like a piece of code optimized for a very specific problem on a very specific CPU. To make things worse, there are very few independent sources. Wikipedia only links to a re-implementation found at code.google.com and one "independent performance comparison of Judy to Hash Tables". The latter does not look that promising. Quoting its summary:

      As illustrated in this data, Judy's smaller size does not give it an enormous speed advantage over a traditional "trade size for speed" data structure. Judy has received countless man-hours developing and debugging 20,000 lines of code; I spent an hour or three writing a fairly standard 200-line hash table.

      If your data is strictly sequential; you should use a regular array. If your data is often sequential, or approximately sequential (e.g. an arithmetic sequence stepping by 64), Judy might be the best data structure to use. If you need to keep space to a minimum--you have a huge number of associative arrays, or you're only storing very small values, Judy is probably a good idea. If you need an sorted iterator, go with Judy. Otherwise, a hash table may be just as effective, possibly faster, and much simpler.

      Furthermore, there aren't many ways to improve Judy, but there's a lot of room to improve hash tables. The one tested here is just my standard "classic" hash table. I don't know what best practice really is these days. For example, Knuth describes numerous strategies for hashing to allow the table to be more full. [...]

      I did not look at the code. But I flipped through the "Shop Manual" and a had a laugh at the "10-MINUTE DESCRIPTION OF HOW JUDY ARRAYS WORK AND WHY THEY ARE SO FAST" that ends with "Well I cannot describe Judy in 10 minutes". One thing that I've learned very early in my job is that you should be able to explain your product to anyone you meet in an elevator, within the time you spend in the elevator. That's basically within a minute or less. The shop manual also fails to sell Judy arrays to me.

      Don't get me wrong, it's good to have a perl interface for Judy arrays, but I currently doubt that they are ready for production.

      Code optimized for a very specific CPU, and further more, seemingly written so it can't be easily ported. An algorithm offering only slight advantages, if any, over existing solutions. Small performance and memory gains. It may be worth going that way if you have maxed out your multi-million dollar super computer. In all other cases, switching to a faster CPU, adding more memory, and use conventional algorithms looks easier and more maintainable to me.

      Of course, I may be totally wrong and everyone should store everything in Judy arrays.

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
        Look at the positive sides, the pic from real Judy is quite cute ... :)

        I really don't care why the OP's worksite wants those hashes.

        Sounds like lazy attempt to solve an XY question to me ...

        so not our business!

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

Re: need help with Tie::Judy
by ikegami (Patriarch) on Dec 06, 2020 at 10:39 UTC

    I suspect that using Judy would be faster than using a module using a tied interface.

Re: need help with Tie::Judy
by perlfan (Vicar) on Dec 07, 2020 at 21:38 UTC
    Obligatory suggestion to check out PDL.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11124721]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (6)
As of 2024-04-23 08:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found