http://www.perlmonks.org?node_id=476700

GrandFather has asked for the wisdom of the Perl Monks concerning the following question:

I see reference to "Schwartzian Transform" often around the Monastery. What is it? A search in WikiPedia turns up "Schwarzian derivative", but that doesn't help my understanding much!

Update: s/Schwarzian/Schwartzian/g

Perl is Huffman encoded by design.

Retitled by holli from 'What is "Schwarzian Transform"'.

Replies are listed 'Best First'.
Re: What is "Schwartzian Transform"
by monkfan (Curate) on Jul 21, 2005 at 03:56 UTC
    I think it is called "Schwartzian Transform". That's why you couldn't find it ;-). Basically it's a sorting technique for arrays of multiple fields. With it you can sort them according to any fields you prefer. Suppose you have this data:
    -r--r--r-- 1 yourname 8318 Jan 30 1996 file1.txt -r--r--r-- 1 yourname 11986 Jan 30 1996 file2.txt -r--r--r-- 1 yourname 46852 Feb 27 1996 file3.txt -r--r--r-- 1 yourname 72698 Feb 27 1996 file4.txt
    And you want to sort them according to size. You would do it with ST like this:
    @sorted_by_size = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [$_, -s] } @files;
    So in principle you would do ST in the following step:
    1. Map the initial list into a list of ref to lists with the original +and modified values 2. Sort the list of references 3. Map the list of ref back into a plain list with the initial values
    Check this out by the very creator himself - Randal Schwartz (merlyn).
    Regards,
    Edward
      How is the sort named after one of the many perlmonks here any more advantageous than a simple:
      use strict; my @data = ( ['-r--r--r--','1','yourname','8318','2000-01-01','ant.txt'], ['-r--r--r--','1','yourname','11986','1992-12-30','tiger.txt'], ['-r--r--r--','1','yourname','72698','2004-03-03','duck.txt'], ['-r--r--r--','1','yourname','46852','1788-01-26','goose.txt'] ); print( "Which column to sort on?" ); my $column = ( <STDIN> =~ m/^\d+$/ ); exit( 1 ) if ( ! defined($column) ); my @sorted = sort { $a->[$column] cmp $b->[$column] } @data; print( join( "\n", map { join( ",", @{$_} ) } @sorted ) );

      IE, just select the column (or hashref) you want to sort on.

      update: I very much like kelan's answer to this post, it indeed shows one area where this other process comes in useful.

        The usual use is when the sort key must be derived from the data using a process that takes significant time/resources. The idea of the ST is to precalculate all of the sort keys before the sort operation so you only spend the time once:

        @sorted = map { $_->[ 1 ] } sort { $a->[ 0 ] <=> $b->[ 0 ] } map { [ expensivefunc( $_ ), $_ ] } @data;
        Doing a naive sort, you would be calling that expensive function twice for every comparison, which would end up being a lot more than when precalculating:
        @sorted = sort { expensivefunc( $a ) <=> expensivefunc( $b ) } @data;

      In that case I won't mention whose node I copied the text from that I pasted into Wikipedia and into the title of my node. :)


      Perl is Huffman encoded by design.

        No reason to perpetuate the error by leaving it uncorrected...

        Update: s/Schwarzian/Schwartzian/

        the lowliest monk

Re: What is "Schwartzian Transform"
by tlm (Prior) on Jul 21, 2005 at 03:59 UTC
      LOL - your nodes' titles have been incorrectly spelled, inverting "z" and "t". But as I say below, I would leave these thread's titles as they are now, with all the errors: we have automatic spelling correction for free in quick searches :)

      Flavio
      perl -ple'$_=reverse' <<<ti.xittelop@oivalf

      Don't fool yourself.
Re: What is "Schwartzian Transform"
by Zaxo (Archbishop) on Jul 21, 2005 at 04:17 UTC
Re: What is "Schwartzian Transform"
by SolidState (Scribe) on Jul 21, 2005 at 05:48 UTC
    A good place to read about sorting in Perl and specifically the "Schwartzian Transform", is Shlomo Yona's lecture slides on Sorting in Perl:
    http://yeda.cs.technion.ac.il/~yona/perl/lecture5/index.html
    This will give you not only a good explanation of the Schwartzian Transform itself but will also put it in context with other sorting techniques. Check out the "Orcish Maneuver" - sounds like something you would do in World of Warcraft :-)
Re: What is "Schwarzian Transform"
by polettix (Vicar) on Jul 21, 2005 at 08:08 UTC
    I'd restore the original, mispelled title. A good part of the added value is in the fact that someone in the future could make the same error, so having this thread pop up immediately would help a lot. But maybe the title in this reply will suffice, just in case all the others decide to fix their ones :)

    Flavio
    perl -ple'$_=reverse' <<<ti.xittelop@oivalf

    Don't fool yourself.
Re: What is "Schwartzian Transform"
by furry_marmot (Pilgrim) on Jul 21, 2005 at 17:16 UTC
    If I remember correctly, the name was coined by Joseph Hall, who co-wrote Effective Perl Programming with Randal Schwartz. As has been mentioned here, the primary reason for the transform is efficiency. Computing the sort term first and eliminating assignments to temporary variables via the list processing features of Perl turns out to yield substantial savings. Here's an example from something I just worked on. I needed to write some code to group the sale prices of recently sold homes by $100k-199k, $200k-299k, etc. and then sort them. To group the prices, instead of using a range or if ($x->{SP} >= 100 and $x->{SP} < 200) {...} elseif ($x >= 200 and $x < 300) (...) etc, I just computed int($home->{sp}/100)*100. Now 128 and 192 become 100, 202 and 246 become 200, etc. There was more to it, but this is an example.

    Now, on the face of it, the sorting would look something like

    @sorted = sort { int($a->{SP}/100)*100 <=> int($b->{SP}/100)*100 } @unsorted;
    The problem is that when you sort 100 items, the number of comparisons made is on the order of N**2 (if I remember correctly). Thus, sorting 1000 items requires a million comparisons, which requires a million instances of dereferencing, doing some math, lopping off the decimals, etc. With more complicated sort terms, it can get quite hairy.

    So for efficiency, the ST creates an array of two-element lists of the form

    ( [$sort_term, $ref_to_orig-data], [$sort_term, $ref_to_orig-data], etc)

    Then you just sort the whole thing once. The trick is to do it without temporary variables. This is where map can be so useful. Read it from the bottom up.

    @sorted_refs = map { $_->[1] } sort { $a->[0] <=> $b->[0] } map { [ int($_->{SP}/100)*100, $_] } @unsorted_refs;

    You can also do this with additional terms, such as sorting within groupings.

    @sorted_refs = map { $_->[2] } sort { $a->[0] <=> $b->[0] || $a->[1] <=> $b->[1] } map { [ int($_->{SP}/100)*100, DateToTmStr($_->{SaleDate}), $_] } @unsorted_refs;

    I read an article a few years ago which takes this concept further and recommends, for certain data, concatenating together the search term, a connector of some kind, and the original data as a single string. By eliminating the dereferencing, you can save quite a bit of time; though this only works if you have data that can be serialized without adding even more work than you save.

        Oops! Apologies to Mr. Christiansen.
Re: What is "Schwarzian Transform" (aka Schwartzian)
by Codon (Friar) on Jul 21, 2005 at 23:39 UTC

    Others have mentioned several books already. I think the concept is actually covered quite well in Learing Perl Objects, References & Modules, which was authored by the transform's namesake. The key of the transform is to greatly reduce the number of times you do an expensive operation in order to derive your sort values.

    Ivan Heffner
    Sr. Software Engineer, DAS Lead
    WhitePages.com, Inc.
Re: What is "Schwarzian Transform" (aka Schwartzian)
by planetscape (Chancellor) on Jul 22, 2005 at 07:34 UTC