Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??

which seems to negate the use of GRT or ST sorting.

Seems being the key word here. :-)

Its true that signifigant work was done to Perls sorting code between 5.6.1 and 5.8.0. Its true that many special cases have now been optmized. In fact it turns out that some of the optimisations that have occured in this period would cause the benchmark I originally posted show bad results for ST and GRT. The switch from quicksort to mergesort means that on average less comparisons are performed per sort and as the "bare" variant does the tr/// per comparison this has a adirect effect on the results. It also appears that optimisations have occured that make the ST _much_ more competitive with the GRT (GRT still wins in the benchmarks I have done however.) Also it appears optimisations have been done on tr/// in count mode, making it a most unsuitable benchmark candidate. Even worse (for the benchmark that is, everybody else gets a win :-) is that mergesort behaves particularly well on almost ordered data. As my test set is relatively ordered (due to the repetive elements) this has a particularly signifigant effect. Simply shuffling the records before the sort (after the replication) causes a dramatic change in the performance.

What all of this means is not that the ST and GRT are "negated" but rather that the circumstances under which they are useful is reduced. This is a good thing. However, the fact still remains that given a relatively expensive comparison function the ST and GRT still win, and the GRT still beats the ST. This can be clearly seen by replacing the calls to tr/// with a subroutine that does the same thing.

Yes, perl has gotten "better" at sorting. No, the ST and GRT are not redundant now. However given the test results I've seen so far I would probably not bother with the GRT. The ST would appear to be nearly the same performance, and a lot easier to handle.When you play around with things, using different comparsion functions, different data sets and distributions, etc you see that the GRT and ST still beat the "bare" sort. And thats the point here. If the comparsion function is expensive, precalculate it. Optimisations in perl may make a given example not behave as expected (which just reinforces the point that benchmarking should happen after the code is complete and not before) but overall, reducing the cost of the comparison still wins you some time. Thus IMO the ST and its derivatives (GRT) will be useful tools for a long time to come.

Cheers,


---
demerphq

    First they ignore you, then they laugh at you, then they fight you, then you win.
    -- Gandhi



In reply to Re: Re: Re: Re: Advanced Sorting - GRT - Guttman Rosler Transform by demerphq
in thread Advanced Sorting - GRT - Guttman Rosler Transform by demerphq

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others scrutinizing the Monastery: (7)
    As of 2014-07-26 06:43 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      My favorite superfluous repetitious redundant duplicative phrase is:









      Results (175 votes), past polls