Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

With merlyn in the offing it's presumptuous of me to write about the Schwartzian Transform, but here it goes.

As others have pointed out, you don't need the ST to handle this task, but it is a neat trick that one often finds in Perl code, so it's good to become familiar with it in any case.

The ST is just an optimization of the sort operation. It often happens that one wants to sort the elements of an array by some property of the elements other than their alphabetic or numeric order. (In your case, this property is a particular substring, or rather, the number represented by a particular substring.) Suppose that the subroutine property computes this property, and let's say, for the sake of this example, that this property happens to be a number. Then, a naive (ascending) sort would be:

my @sorted = sort { property($a) <=> property($b) } @unsorted;
This works fine, but it is inefficient, because for every element $x in @unsorted, property($x) is computed in every sort comparison involving $x. That's a lot of redundant computation.

The insight behind the ST is to perform the computation of the sorting property only once for each element of the array. Instead of sorting the elements of the original array, we sort "transformed" elements, each consisting of an original element (the "payload") and its property (the "key" or "index") packaged in an anonymous array. I.e. we go from sorting this:

($foo, $bar, $baz, ...)
with the sorting function
{ property($a) <=> property($b) }
to sorting this
([$foo, property($foo)], [$bar, property($bar)], [$baz, property($baz) +], ... )
with the sorting function
{ $a->[1] <=> $b->[1] }
Now, for each element in the array, instead of computing its property many times, we only have to dereference the transformed element (an array ref), which is almost always a much cheaper operation.

Once the transformed array is sorted, we recover the desired sorted array by pulling out the original elements from the corresponding transformed elements.

Here is what this strategy looks like when applied to your problem:

my @st = map { [ $_, property($_) ] } @files; my @sorted_st = sort { $a->[1] <=> $b->[1] } @st; my @sorted = map { $_->[0] } @sorted_st; sub property { ( $_[0] =~ /fwlog\.(\d+)\w+/ )[ 0 ] };
The first line generates the transformed array, whose elements each consists of a "payload" (the original element) and its sorting "property". The second line sorts the transformed array. The third line recovers the "payloads" from the sorted transformed array, to produce the desired sorted array. Actually, notice that only the definition of property is truly specific to your problem; the other lines are pretty generic, modulo sort order and whether the sort is numeric or alphabetic.

That's all there is to it, although there a couple more comments worth making about this. One is that often programmers condense this procedure significantly by "pipelining" the three lines above, and building the property subroutine right into the first call to map. I.e., the above would be condensed to

my @sorted = map { $_->[0] } sort { $a->[1] <=> $b->[1] } map { [ $_, ( $_[0] =~ /fwlog\.(\d+)\w+/ )[ 0 ] ] } @files;
The second point is that it often happens that the sorting needs to be done on several properties: to order two objects whose first property is the same, we order by the second property; if the second property is also the same for both, we order by the third property, etc. In such a situation, it is useful to use a property that returns an array of all the sorting properties, and then the sort line of the above would look something like this:
sort { $a->[1] <=> $b->[1] or $a->[2] cmp $b->[2] or $b->[3] <=> $a->[3] }
In this particular example, the sorting is ascending/numerical by the first property, ascending/alphabetical by the second property, and descending/numerical by the third property.

A third point, which is just a repetition of other comments, is that it is not a foregone conclusion that the ST is going to yield an improvement in speed, or an improvement in speed noticeable enough to make the added complexity of the code worthwhile.

The Guttman-Rosler paper cited in another comment describes the ST on its way to presenting a significantly less straightforward, but more powerful, optimization, the Guttman-Rosler transform. It is nowhere near as handy or common as the ST, but if a sort is your program's bottleneck it's worth trying it out.

the lowliest monk

Update: Added "third point".


In reply to Re: Sort on Number Embedded in String by tlm
in thread Sort on Number Embedded in String by Dru

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and all is quiet...

    How do I use this? | Other CB clients
    Other Users?
    Others meditating upon the Monastery: (3)
    As of 2018-07-20 22:32 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?
      It has been suggested to rename Perl 6 in order to boost its marketing potential. Which name would you prefer?















      Results (441 votes). Check out past polls.

      Notices?