Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
I thought I would add to this snippet since I had to stare at the code for far to long before I figured out how I might use it.

Okay so we know it is for sorting, but why is it useful? It is a heck of a lot faster for sorting large data sets where the data has to be manipulated before sorting. The example I came up with is this:

We have a huge list of names, and we want to sort them alphabetically by last name, first name, then middle name. That doesnt sound too bad. But the problem is that our database does not store last names, first names or middle names seperately. Instead there is just a single name field like "Arthur C. Clarke". So when we get all the data from the database first we have to put the string in alphapbetical sort order (ie make "Arthur C. Clarke" into "Clarke Arthur C.") then sort it.

Someone without Randal Schwartz's insight (namely me) would have done something like this:
my @sorted = sort mysort @unsorted; sub mysort { $a =~ m/(.*?)\s*(\S+)$/; $aa = "$2 $1"; $b =~ m/(.*?)\s*(\S+)$/; $bb = "$2 $1"; return $aa cmp $bb; }
Now this would be bad for large data sets because each element in @unsorted will get sent to mysort several time, so we are then performing the match several times where we dont really need to.

So in order to only perform the match once we go through the array and replaces each element with an array reference containing the original element (dont loose the original!) and the new sortable string. So the first step:
@unsorted = ("Larry Wall", "Arthur C. Clarke"); @a = map { m/(.*?)\s*(\S+)$/; [$_, "$2 $1" ] } @unsorted; # # now @a is (["Larry Wall", "Wall Larry"], # ["Arthur C. Clarke", "Clarke Arthur C."]) #
Now we do a simple sort on the second element in @a: @b = sort {$a->[1] cmp $b->[1]} @a; And finally we turn the 2D array back into a sorted 1D array restoring the original values: @sorted = map {$_->[0]} @b; If you cascade the function calls, then you remove the intermediate steps of my @a, and @b and you get the Schwartzian Transform:
my @sorted = map{ $_->[0] } sort {$a->[1] cmp $b->[1]} map { m/(.*?)\s*(\S+)$/; [$_, "$2 $1" ] } @unsorted;
Now that is pretty cool stuff, but how much of a time savings is this extra work? Well, I did a small benchmarking test to mimic what a time savings would under a large data set:
#!/usr/bin/perl use Benchmark; @unsorted = ("Larry Wall", "Jane Sally Doe", "John Doe", "Morphius", "Jane Alice Doe", "Arthur C. Clarke"); timethese(100000, { "schwartzian" => sub { my @sorted = map{ $_->[0] } sort {$a->[1] cmp $b->[1]} map { m/(.*?)\s*(\S+)$/; [$_, "$2 $1" +] } @unsorted }, "sort routine" => sub { my @sorted = sort mysort @unsorted } }); sub mysort { $a =~ m/(.*?)\s*(\S+)$/; $aa = "$2 $1"; $b =~ m/(.*?)\s*(\S+)$/; $bb = "$2 $1"; return $aa cmp $bb; }
And the results:
Benchmark: timing 100000 iterations of schwartzian, sort routine... schwartzian: 57 wallclock secs (53.66 usr + 0.30 sys = 53.96 CPU) sort routine: 124 wallclock secs (122.48 usr + 0.41 sys = 122.89 CPU)

I hope this helps.

In reply to RE: Schwartzian Transform by perlmonkey
in thread Schwartzian Transform by merlyn

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (4)
As of 2024-03-19 07:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found