Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Re: Things I have learnt

by davido (Archbishop)
on Oct 22, 2004 at 03:40 UTC ( #401366=note: print w/replies, xml ) Need Help??

in reply to Things I have learnt

At the risk of providing an inferior answer amid higher authorities, I'll try to explain the Schwartzian Transform.

First, one has to understand why it is useful. Assume you want to sort a list by, for example, ignoring the first character in each element. In other words, you want to sort 'Babc' before "Abcd". You could write such a sort like this:

my @sorted = sort { substr( $a, 1 ) cmp substr( $b, 1 ) } @unsorted;

And that works fine. But remember that the comparison routine of a sort function gets called many times. ...possibly several times per element being sorted. It's somewhat inefficient calling substr twice for each comparison, over and over and over again as the list gets sorted.

There is a better way. What if, instead of calling substr over and over again, multiple times per element being sorted, you called it once per element and stored the results? If you can create, store, and access this cached list fast enough, it will be more efficient than doing complex calculations over and over again on the same element throughout the course of the sort.

The Schwartzian Transform does this. It creates a cache of whatever complex computation you had in mind, and saves it alongside the original data, so that it's easy to keep track of the real data and its cached pre-computed sort criteria at the same time.

The most common form of the Schwartzian Transform takes a flat list, and turns it into a list of references to two-element arrays. Here's how our substr( $string, 1 ) might work in a ST.

@original | @transform | [0] [1] _________ | _________________________ abc | abc bc asdf | asdf sdf jkas | jkas kas qwert | qwert wert uiop | uiop iop

So in the above example:

@original = qw/abc asdf jkas qwert uiop/;


@transformed = ( [ 'abc' , 'bc' ], [ 'asdf' , 'sdf' ], [ 'jkas' , 'kas' ], [ 'qwert', 'wert' ], [ 'uiop' , 'iop' ] );

Now lets look at the complete transform, as commonly seen:

my @sorted = map { $_->[0] } sort { $a->[1] cmp $b->[1] } map { [ $_, substr( $_, 1 ) ] } @unsorted;

Look at this from the bottom up:

  1. Start with a list called @unsorted
  2. Feed @unsorted into map. The bottom map takes each element of @unsorted and turns it into an anonymous array with two elements. The first is the original string, and the second is the cached complex calculation.
  3. Feed the list of array-refs to sort. Inside sort, dereference each array-ref to compare the 2nd elements (the cached versions).
  4. Finally, feed this list of anon-array-refs (now sorted into the final order) into the top map. This final step strips away the cached information, leaving only the original strings (the array-refs get thrown away too).
  5. The sorted original strings get assigned to @sorted

In actuality, this particular example may not be all that more efficient, since substr is pretty fast, and anonymous array creation and dereferencing takes time. But for longer lists, and particularly for more complex sort computations, the ST shines.

I hope this helps. :)


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://401366]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (6)
As of 2017-02-25 21:18 GMT
Find Nodes?
    Voting Booth?
    Before electricity was invented, what was the Electric Eel called?

    Results (369 votes). Check out past polls.