Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re: Advanced Sorting - GRT - Guttman Rosler Transform

by dmmiller2k (Chaplain)
on Feb 18, 2002 at 19:14 UTC ( #146217=note: print w/ replies, xml ) Need Help??


in reply to Advanced Sorting - GRT - Guttman Rosler Transform

Wow. I use this trick all the time; I am happy to finally learn it has a name.

I came up with (so I thought) the idea of packing the array elements into decodable strings, sorting those, then decoding them, in the midst of trying to optimize a particulary hairy Sybase query. I was able to get rid of all the table scans (non-indexed searches) except the last one (the ORDER BY clause). I started experimenting with sorting in Perl and then got sucked into trying to speed it up.

I suppose I've always thought of it as a variation on the ST: precalculating the sort keys using map, sorting based upon those keys, and finally, another map to extract the actual data again. I never realized that the ST specified explicitly creating anonymous arrays.

Thanks (and ++), demerphq, for giving this a name.

Update: I forgot to add that, I've been using sprintf(), rather than pack(), which would obviously be even faster (did I say, "Thanks," demerphq?), but in my case, there were enough rows that the key generation phase was more-or-less insignificant compared with the actual sorting process. The hardest part is inverting the portions of the key which must sort descending.

In other words, if using dws's example, one had to sort ascending by the first column ('foo' in the example), then descending by the second column (47 and 103, respectively), you need to invert the second column by using 10's complement for the number of significant digits you have. E.g., if the name column has up to 10 characters, and the value column can go up to 10000 (four digits),

my @sorted = map { [substr($_, 0, 10), 10000-substr($_, 10, 4)] } sort map { sprintf("%10.10s%04d", $_->[0], 10000-$_->[1]) @uns +orted;

dmm

If you GIVE a man a fish you feed him for a day
But,
TEACH him to fish and you feed him for a lifetime


Comment on Re: Advanced Sorting - GRT - Guttman Rosler Transform
Select or Download Code
Re: Re: Advanced Sorting - GRT - Guttman Rosler Transform
by merlyn (Sage) on Feb 18, 2002 at 19:32 UTC
    I suppose I've always thought of it as a variation on the ST: precalculating the sort keys using map, sorting based upon those keys, and finally, another map to extract the actual data again. I never realized that the ST specified explicitly creating anonymous arrays.
    And I agree with you, but I'm not willing to fight for that belief being universal (that the GRT is really just a specialization of the ST). I've got other issues to fight.

    In case you were curious. {grin}

    Another way to look at it is that I made "map-sort-map" popular in the Perl community, by issuing one specific use of it which some have come to know as the ST. The GRT gang came up with another use that is harder to generalize, but can produce better results for a subrange of problems. I don't believe that the GRT can be sufficiently generalized in any practical sense to cover all problems, hence knowing both manifest map-sort-map strategies (and inventing others) is probably the best meta-strategy.

    -- Randal L. Schwartz, Perl hacker

      And I agree with you, but I'm not willing to fight for that belief being universal (that the GRT is really just a specialization of the ST).

      Actually, if I've presented the GRT as anything other than a refinement of the Schwartzian Transform then I've communicated myself poorly.

      The whole reason I started by discussing the Schwartzian Transform in my original post was to explain GRT in the context of the Schwartzian Transform.

      Personally I would call anything along these lines a Schwartzian Transform, but when it was warranted would instead use the more specific term GRT, but this is not in the slightest meant to imply that a GRT isn't a Schwartzian Transform. I suppose its like the square/rectangle idea. All GRTs are Schwartzian Transforms but not all Schwartzian Transforms are GRTs.

      I would say that in the texts I've read (and your own snippet here in the monastery) Schwartzian Transforms are characterized by using an inorder function that accesses part of a wrapper containing the precomputed keys, whereas GRTs are characterized by not having an explicit inorder function at all.

      I don't believe that the GRT can be sufficiently generalized in any practical sense to cover all problems, hence knowing both manifest map-sort-map strategies (and inventing others) is probably the best meta-strategy.

      I couldnt agree more. (And stated something like this at the end of my post, although not in so many words)

      Oh and thank you for introducing such an interesting technique to me and the community as a whole. I use it all the time.

      PS - It would be really cool if you retitled your snippet Schwartzian Transform to "Schwartzian Transform (ST)" so that we could link to it via [ST] ;-) Or even write a new node explaining it a bit, you know straight from the horses mouth so to speak. But perhaps you dont have enough time?

      Yves / DeMerphq
      --
      When to use Prototypes?

      I don't believe that the GRT can be sufficiently generalized in any practical sense to cover all problems
      Given a function to convert floats to lexicographically ordered strings, you should be able do a GRT on any series of || linked cmp and <=> operator operators. Are there any antisymmetric, transitive, total relations that can't be written as a series of || linked cmp and <=> operators?
        Easy. Imagine you have to sort objects. Objects written by someone who understands encapsulation (hence, unlikely to be a native Perl programmer). So, there's no way to compare two objects using an operator - you've got to call a method in one of the objects, passing the other as argument, for instance:
        @sorted = sort {$a->cmp($b)} @unsorted;
        I doubt you can either use a ST or a GRT (I am one of the people who don't think GRT is a special case of ST) to sort this.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://146217]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (15)
As of 2014-12-29 15:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (192 votes), past polls