Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

which seems to negate the use of GRT or ST sorting.

Seems being the key word here. :-)

Its true that signifigant work was done to Perls sorting code between 5.6.1 and 5.8.0. Its true that many special cases have now been optmized. In fact it turns out that some of the optimisations that have occured in this period would cause the benchmark I originally posted show bad results for ST and GRT. The switch from quicksort to mergesort means that on average less comparisons are performed per sort and as the "bare" variant does the tr/// per comparison this has a adirect effect on the results. It also appears that optimisations have occured that make the ST _much_ more competitive with the GRT (GRT still wins in the benchmarks I have done however.) Also it appears optimisations have been done on tr/// in count mode, making it a most unsuitable benchmark candidate. Even worse (for the benchmark that is, everybody else gets a win :-) is that mergesort behaves particularly well on almost ordered data. As my test set is relatively ordered (due to the repetive elements) this has a particularly signifigant effect. Simply shuffling the records before the sort (after the replication) causes a dramatic change in the performance.

What all of this means is not that the ST and GRT are "negated" but rather that the circumstances under which they are useful is reduced. This is a good thing. However, the fact still remains that given a relatively expensive comparison function the ST and GRT still win, and the GRT still beats the ST. This can be clearly seen by replacing the calls to tr/// with a subroutine that does the same thing.

Yes, perl has gotten "better" at sorting. No, the ST and GRT are not redundant now. However given the test results I've seen so far I would probably not bother with the GRT. The ST would appear to be nearly the same performance, and a lot easier to handle.When you play around with things, using different comparsion functions, different data sets and distributions, etc you see that the GRT and ST still beat the "bare" sort. And thats the point here. If the comparsion function is expensive, precalculate it. Optimisations in perl may make a given example not behave as expected (which just reinforces the point that benchmarking should happen after the code is complete and not before) but overall, reducing the cost of the comparison still wins you some time. Thus IMO the ST and its derivatives (GRT) will be useful tools for a long time to come.

Cheers,


---
demerphq

    First they ignore you, then they laugh at you, then they fight you, then you win.
    -- Gandhi



In reply to Re: Re: Re: Re: Advanced Sorting - GRT - Guttman Rosler Transform by demerphq
in thread Advanced Sorting - GRT - Guttman Rosler Transform by demerphq

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (5)
As of 2024-03-28 23:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found