Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re^4: best sort

by jpl (Monk)
on Aug 16, 2011 at 16:27 UTC ( #920519=note: print w/ replies, xml ) Need Help??


in reply to Re^3: best sort
in thread best sort

If those were the things I was looking into, I would certainly not want a bare sort for any of them.

If you anticipate repeated elements, you'd probably want a tie-breaking

$a cmp $b
(the bare sort comparison function) on all but the first, because non-identical terms can otherwise compare equal and identical elements may not be adjacent in the sorted output. But I think we are in fundamental agreement: You need to know what you are comparing and how. That may be easier said than done. Knowing that "machenry" is a Scottish name, but "machinery" is not (or is it, and, if not, why not) makes "knowing what you are comparing" non-trivial.


Comment on Re^4: best sort
Download Code
Re^5: best sort
by tchrist (Pilgrim) on Aug 16, 2011 at 17:30 UTC
    If those were the things I was looking into, I would certainly not want a bare sort for any of them.
    If you anticipate repeated elements, you'd probably want a tie-breaking
    $a cmp $b
    Well, sure. Here’s some of the code to generate one of those:
    say $_->{PHRASE} for sort {     $b->{TOTAL_VOWELS} <=>  $a->{TOTAL_VOWELS}         ||    $b->{MAX_ANY_VOWEL} <=>  $a->{MAX_ANY_VOWEL}         ||         $b->{NUM_OF_A} <=>  $a->{NUM_OF_A}         ||         $b->{NUM_OF_E} <=>  $a->{NUM_OF_E}         ||         $b->{NUM_OF_I} <=>  $a->{NUM_OF_I}         ||         $b->{NUM_OF_O} <=>  $a->{NUM_OF_O}         ||         $b->{NUM_OF_U} <=>  $a->{NUM_OF_U}         ||         $b->{NUM_OF_Y} <=>  $a->{NUM_OF_Y}         ||         $a->{DICTFOLD} cmp  $b->{DICTFOLD};         ||          $a->{RECNO} <=>  $b->{RECNO}; } @records;
    Look more reasonable?
      It helps a lot with the how I am comparing part, although if anyone guessed in advance that was how By vowels: sorted, I'd like to solicit their advice on the outcome of upcoming NFL games. It's still not exactly what I had in mind for bringing all identical terms together. The
      $a->{RECNO} <=> $b->{RECNO}
      is unnecessary if the sort is stable, as sort() is, by default. Since different words can compare equal under the influence of
      $a->{DICTFOLD} cmp $b->{DICTFOLD}
      (if I'm correctly guessing what DICTFOLD is), I still might see
      word Word word Word
      in the sorted output, when I might have preferred
      word word Word Word
      which makes it easier to determine if words are "unique" without repeating all the complicated logic. So I would prefer
      $a->{ORIGINAL} cmp $b->{ORIGINAL}
      as the final tie-breaker. That's neither better nor worse than your code, merely different.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://920519]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (15)
As of 2014-07-29 20:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (227 votes), past polls