|Problems? Is your data what you think it is?|
Re: Re: Re: Re: Advanced Sorting - GRT - Guttman Rosler Transformby demerphq (Chancellor)
|on Nov 27, 2003 at 00:16 UTC||Need Help??|
which seems to negate the use of GRT or ST sorting.
Seems being the key word here. :-)
Its true that signifigant work was done to Perls sorting code between 5.6.1 and 5.8.0. Its true that many special cases have now been optmized. In fact it turns out that some of the optimisations that have occured in this period would cause the benchmark I originally posted show bad results for ST and GRT. The switch from quicksort to mergesort means that on average less comparisons are performed per sort and as the "bare" variant does the tr/// per comparison this has a adirect effect on the results. It also appears that optimisations have occured that make the ST _much_ more competitive with the GRT (GRT still wins in the benchmarks I have done however.) Also it appears optimisations have been done on tr/// in count mode, making it a most unsuitable benchmark candidate. Even worse (for the benchmark that is, everybody else gets a win :-) is that mergesort behaves particularly well on almost ordered data. As my test set is relatively ordered (due to the repetive elements) this has a particularly signifigant effect. Simply shuffling the records before the sort (after the replication) causes a dramatic change in the performance.
What all of this means is not that the ST and GRT are "negated" but rather that the circumstances under which they are useful is reduced. This is a good thing. However, the fact still remains that given a relatively expensive comparison function the ST and GRT still win, and the GRT still beats the ST. This can be clearly seen by replacing the calls to tr/// with a subroutine that does the same thing.
Yes, perl has gotten "better" at sorting. No, the ST and GRT are not redundant now. However given the test results I've seen so far I would probably not bother with the GRT. The ST would appear to be nearly the same performance, and a lot easier to handle.When you play around with things, using different comparsion functions, different data sets and distributions, etc you see that the GRT and ST still beat the "bare" sort. And thats the point here. If the comparsion function is expensive, precalculate it. Optimisations in perl may make a given example not behave as expected (which just reinforces the point that benchmarking should happen after the code is complete and not before) but overall, reducing the cost of the comparison still wins you some time. Thus IMO the ST and its derivatives (GRT) will be useful tools for a long time to come.
First they ignore you, then they laugh at you, then they fight you, then you win.