Can anyone point me to some simple examples ?
There are some examples to be found in the Tie-Judy test suite.
(The test suite consists only of one file.)
Cheers, Rob | [reply] |
Although the documentation for Tie::Judy seems quite lacking (ie. there is none), the SYNOPSIS appears enough to get someone at least started. Have you looked there?
Perhaps give it a try, and if you have problems, come back here with specific questions.
| [reply] [d/l] |
Yes I have looked at the documentation for Tie::Judy and the example code is not complete. I need a complete functioning example.
| [reply] |
| [reply] [d/l] [select] |
I've never heard of Judy Arrays before, but Tie::Judy provides a tie interface (sic), which means you can use it like any other Perl hash after tieing it.
use Tie::Judy;
tie %judy, 'Tie::Judy';
If that's a problem, please look into perlintro#Hashes and maybe perldata on how to use Perl hashes. This should get you started.
All the other methods listed in the Synopsis seem to be for performance boosts with bulk operations.
| [reply] [d/l] |
I've never heard of Judy Arrays before
Same for me. But from a first look at the Wikipedia article, it seems to me that Judy arrays offer a significant gain over hash tables only for some edge cases. Quoting Wikipedia:
Judy arrays are designed to minimize the number of expensive cache-line fills from RAM, and so the algorithm contains much complex logic to avoid cache misses as often as possible. Due to these cache optimizations, Judy arrays are fast, especially for very large datasets. On data sets that are sequential or nearly sequential, Judy arrays can even outperform hash tables, since, unlike hash tables, the internal tree structure of Judy arrays maintains the ordering of the keys.
And further:
Judy arrays are extremely complicated. The smallest implementations are thousands of lines of code. In addition, Judy arrays are optimized for machines with 64 byte cache lines, making them essentially unportable without a significant rewrite. In most applications the possible performance advantage is too small to justify the high complexity of the data structure implementation.
So, for me, it looks like a piece of code optimized for a very specific problem on a very specific CPU. To make things worse, there are very few independent sources. Wikipedia only links to a re-implementation found at code.google.com and one "independent performance comparison of Judy to Hash Tables". The latter does not look that promising. Quoting its summary:
As illustrated in this data, Judy's smaller size does not give it an enormous speed advantage over a traditional "trade size for speed" data structure. Judy has received countless man-hours developing and debugging 20,000 lines of code; I spent an hour or three writing a fairly standard 200-line hash table.
If your data is strictly sequential; you should use a regular array. If your data is often sequential, or approximately sequential (e.g. an arithmetic sequence stepping by 64), Judy might be the best data structure to use. If you need to keep space to a minimum--you have a huge number of associative arrays, or you're only storing very small values, Judy is probably a good idea. If you need an sorted iterator, go with Judy. Otherwise, a hash table may be just as effective, possibly faster, and much simpler.
Furthermore, there aren't many ways to improve Judy, but there's a lot of room to improve hash tables. The one tested here is just my standard "classic" hash table. I don't know what best practice really is these days. For example, Knuth describes numerous strategies for hashing to allow the table to be more full. [...]
I did not look at the code. But I flipped through the "Shop Manual" and a had a laugh at the "10-MINUTE DESCRIPTION OF HOW JUDY ARRAYS WORK AND WHY THEY ARE SO FAST" that ends with "Well I cannot describe Judy in 10 minutes". One thing that I've learned very early in my job is that you should be able to explain your product to anyone you meet in an elevator, within the time you spend in the elevator. That's basically within a minute or less. The shop manual also fails to sell Judy arrays to me.
Don't get me wrong, it's good to have a perl interface for Judy arrays, but I currently doubt that they are ready for production.
Code optimized for a very specific CPU, and further more, seemingly written so it can't be easily ported. An algorithm offering only slight advantages, if any, over existing solutions. Small performance and memory gains. It may be worth going that way if you have maxed out your multi-million dollar super computer. In all other cases, switching to a faster CPU, adding more memory, and use conventional algorithms looks easier and more maintainable to me.
Of course, I may be totally wrong and everyone should store everything in Judy arrays.
Alexander
--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
| [reply] |
| [reply] |
| [reply] |
Obligatory suggestion to check out PDL. | [reply] |