Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Understanding the Schwartzian transform.

by Anonymous Monk
on Jul 22, 2013 at 06:36 UTC ( [id://1045598]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi.

I was looking at a article that MJD wrote regarding the Schwartzian Transform and I simply wanted make sure I was interpreting (pun intended) the mechanism correctly:

@sorted_names = map { $_->[0] } sort { $b->[1] <=> $a->[1] } map { [ $_, -M $_ ] } readdir D;

Reading from the bottom up, I think perl: Reads the 'D' directory, builds an anonymous array containing the name of the file and its associated size in bytes which is then converted to a list(?) by map before the aforementioned items are sorted in reverse order. The top most map operation is slightly confusing since it appears to be doing something with the first element of the list before placing the sorted items in the array. Am I close to understanding it? Is the Schwartzian Transform *always* constructed in this manner?

Thanks everyone.

Replies are listed 'Best First'.
Re: Understanding the Schwartzian transform.
by tobyink (Canon) on Jul 22, 2013 at 07:51 UTC

    The general form of the Schwartzian Transform is this:

    my @sorted = map { $_->[0] } sort { $a->[1] cmp $b->[1] } # or <=>, or $b before $a map { [$_, f($_)] } @unsorted;

    ... where f($foo) is a comparatively expensive function.

    For the purposes of this discussion, let's suppose that @unsorted is qw(8 11 9 4 3) and f($n) is a function that converts a number into Roman numerals. So we're sorting the numbers alphabetically by Roman numeral. We want the result qw(3 4 9 8 11) because 3 is "iii" and 11 is "xi".

    Reading from bottom to top, the first transform map { [$_, f($_)] } converts each number to an arrayref where the first element is the original number, and the second is the Roman numeral for it:

    # the first map does this qw(8 11 9 4 3) --> ( [ 8, "viii"], [11, "xi"], [ 9, "ix"], [ 4, "vi"], [ 3, "iii"], )

    Now we sort the transformed list alphabetically by $_->[1]:

    # the sort does this: ( [ 8, "viii"], [11, "xi"], [ 9, "ix"], [ 4, "vi"], [ 3, "iii"], ) --> ( [ 3, "iii"], [ 9, "ix"], [ 4, "vi"], [ 8, "viii"], [11, "xi"], )

    And the second transform map { $_->[0] }, just pulls out the original number from each arrayref:

    ( [ 3, "iii"], [ 9, "ix"], [ 4, "vi"], [ 8, "viii"], [11, "xi"], ) --> qw( 3 9 4 8 11 )

    Let's see all that in action:

    use strict; use warnings; use Roman (); my @unsorted = qw(8 11 9 4 3); *f = \&Roman::roman; my @sorted = map { $_->[0] } sort { $a->[1] cmp $b->[1] } # or <=>, or $b before $a map { [$_, f($_)] } @unsorted; print "@sorted\n"; __END__ # output shown below 3 4 9 8 11

    The sharp-eyed may have spotted that there's a simpler way to do the same sort:

    my @sorted = sort { f($a) cmp f($b) } @unsorted;

    So why the transform? Well, it enables you to call f() fewer times; the function is called just once for each item on the list. Using the simpler version it's called twice for each pair of items that is compared, and there are typically more comparisons performed than items on the list. Let's have f() log how many times it was called. We'll define it like this:

    *f = sub { $::count++; END { print "f() called $::count times\n" }; goto \&Roman::roman; };

    Running this on my computer (and this may vary depending on the sorting algorithm that your version of Perl uses - it's changed once or twice), f() is called 10 times. With the transform it's called 5 times.

    If we extend the original unsorted list to 100 items; e.g.:

    use List::Util "shuffle"; my @unsorted = shuffle(1..100);

    ... then f() gets called typically over 1000 times, compared to exactly 100 for the version with the transform. If f() takes 0.01 seconds to run, that's a 10 seconds without the transform versus 1 second with the transform. If the unsorted list were 1000 items long, it would be about 3 minutes without the transform, versus 10 seconds with the transform.

    (PS: Roman is available on CPAN; published by our fellow monk chorny.)

    package Cow { use Moo; has name => (is => 'lazy', default => sub { 'Mooington' }) } say Cow->new->name
Re: Understanding the Schwartzian transform.
by choroba (Cardinal) on Jul 22, 2013 at 06:40 UTC
    The map builds a list of anonymous arrays, or better, it builds a list of tuples file name, file age. Sort then sorts the tuples based on their second element (the age), the final map throws the age away and only leaves the names.
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Understanding the Schwartzian transform.
by Loops (Curate) on Jul 22, 2013 at 06:42 UTC

    The main thing to grok is that the first map creates an anonymous array reference with the first element containing the data and the second element containing the sort key. After sorting by the second element, the final map ignores the sort key and just returns the data element from the array.

    choroba is too fast for me...
Re: Understanding the Schwartzian transform.
by hdb (Monsignor) on Jul 22, 2013 at 06:44 UTC
Re: Understanding the Schwartzian transform.
by rjt (Curate) on Jul 22, 2013 at 06:47 UTC

    In English, from right to left (bottom up, in this case): read list of file names in directory. Store each name in an array ref containing the filename, and its modification time. Take that new list, and sort it in descending order based on a custom sort sub that does a numerical compare on the 2nd element of the array ref (I.e., the modification time). Finally, take that now-sorted list of array refs and map each ref back to the filename by extracting the first element (remember arrays are 0-based). The result is stored in @sorted_names. Try breaking that up into its components and printing out the result of each step if you're still unsteady on the details. Hope this helps.

      ...and its associated size in bytes
      s/associated size in bytes/modification time/;


      Excellent. Thanks all for the help. Is this the canonical form of the Schwartzian Transform or a variant?

        Is this the canonical form of the Schwartzian Transform or a variant?

        Yes, it can be considered as the canonical form of the Schwartzian Transform. The original ST that appeared in a newsgroup more than 15 years ago might not have worked on fila age (I don't remember), but, definitely, the "output_array = map {block} sort {block} map {block} input array" is the canonical form of the Schwartzian Transform.

Re: Understanding the Schwartzian transform.
by Anonymous Monk on Jul 22, 2013 at 06:58 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1045598]
Approved by hdb
Front-paged by rjt
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (5)
As of 2024-03-19 10:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found