Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??
Ah, yes, you're right. When I started, I had intended to better show that understanding this one instance, which is a specific form of the GRT, which is in turn a specialization of the Schwartzian Transform, it's fairly easy to generalize the understanding to cover the more general cases.

But, I never quite made it that far.

Thanks for the vote of confidence anyway!

Update: Let's attempt to rectify this situation.

The analysis on the root node is a very specific, sort transformation technique. I've only just scratched the surface, but I believe this technique is very powerful and should be useful for nearly any sorting job. However, it's not the only technique out there. I think that by understanding this technique, you should be able to easily generalize this to better understand the other transformations.

Guttman Rosler Transformations

The next step up are the Guttman Rosler Transformations. The core idea of the GRT is to use the built in perl ascii sorting. In order to use this, we need to create stringified representations of our data first, then sort, then get the original data back. In a lot of cases, the map functions do a lot of work to put the data in the correct form, and then undo that work after the sort to get the data back. An example of this can be found at this node by jdporter.
my @files = <DATA>; chomp @files; @files = map { substr $_, 10 } sort map { sprintf "%10s%s", /(\d+\w{3}\d{4})/, $_ } @files; print join "\n", @files, ''; __DATA__ fwlog.14Mar2005.gz fwlog.15Mar2005.gz fwlog.16Mar2005.gz fwlog.1Mar2005.gz fwlog.2Mar2005.gz fwlog.20Mar2005.gz
Slightly modified to replace original @a array with more descriptive @files array

In this case the OP wanted to sort the files by date. So, the above code took the array of text data in @files, simply grabbed the date string as-is, and appended that onto the front of the original string, sorted, and then extracted the original string. At first glance the map after the sort (the encoding map) doesn't do enough work to sort properly since it doesn't appear to align the days when dealing with single digits. However, it works because the sprintf("%10s"...) right justifies the text which does align the dates.

Schwarzian Transformations

In the same thread, there are multiple versions of a Schwartzian Transformation. A Schwartzian Transformation generally creates an array of 'stuff'; part of that array is the original data, other parts are the pieces to sort against. And the call back sorting routine is specified. Probably the best ST answer given is this one by RazorbladeBidet.
my @months = qw( Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec ); my %months; my $i = 1; $months{$_} = sprintf( "%02d", $i++ ) for @months; print $_, "\n" for map { $_->[0] } sort { $a->[1] <=> $b->[1] or $a->[2] <=> $b->[2] or $a->[3] <=> $b->[3] } map { (split(/\./,$_,3))[1] =~ /^(\d+)([A-Za-z]+)(\d+)$/; [$_, $3, $months{$2}, sprintf("%02d", $1) ] } @files;
This is slightly modified from the original to show the intent of the original author. The original had sprintf("%02d", $3), or against the 4 digit year, which ends up to not do anything. I believe he'd intended to use that on the date itself.

Here, he goes through the trouble to map the month abreviations to numbers via a generated hash, and makes sure to sort by year then month then day. Let's see how this works.

Again the map statement after the sort is executed first, and it looks pretty complicated. The split is matching the periods in the filenames, and is only keeping the 2nd part which is the date information. It then matches and returns the individual parts of the date in $1, $2 and $3 corresponding to the day, month, year. Then, an array of 4 items is created, the original string, the year, the month in numeric format, and finally the date with a leading zero if date is a single digit. This is done for every member of the @files array.

The resulting array of arrays is sent to the sort routine which uses the supplied call back function to compare first the year, then the numeric version of the month, and finally the date.

The map function before the sort then gets called to pull out only the original version of the data, which is then printed rather than stored.

Conclusion

Hopefully between the original node, and this subsequent analysis of the other transformations, this has given you enough understanding to use these powerful sorting techniques for your own needs.

-Scott


In reply to Re^2: Understanding transformation sorts (ST, GRT), the details by 5mi11er
in thread Understanding transformation sorts (ST, GRT), the details by 5mi11er

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others examining the Monastery: (4)
    As of 2014-07-31 00:13 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      My favorite superfluous repetitious redundant duplicative phrase is:









      Results (241 votes), past polls