http://www.perlmonks.org?node_id=122012

Recently, a question was asked on the Perl Beginners mailing list about how to sort data with embedded dates. All of the data was stuffed into a scalar with two newlines separating each row (though I suspected that the person asking the question may have gotten that incorrect). Basically, we were presented with the following scalar:

my $data = "buy 1/23/01 100 12.625 50 25.25 \n buy 09/1/01 100 12.625 50 25.25 \n buy 10/23/01 100 12.625 50 25.25 \n buy 10/25/01 100 12.625 50 25.25 \n";

I split the data and sorted it with a Schwartzian transform:

my @data = split /\n\n/, $data; @data = map { join ' ', ( $_->[0], ( join '/', @{$_->[1]} ), $_->[2] ) } sort { $a->[1][2] <=> $b->[1][2] || $a->[1][0] <=> $b->[1][0] || $ +a->[1][1] <=> $b->[1][1] } map { [ $_->[0], [split /\//, $_->[1]], $_->[2] ] } map { [split( /\s+/, trim( $_ ), 3) ] } @data; sub trim { my $data = shift; $data =~ s/^\s+//; $data =~ s/\s+$//; return $data; }

When I sent the snippet, I also included an explanation of how the transform works. However, I'm wondering if this is an appropriate way to go. I need to teach new tricks to some of the old dogs that I work with and I can't help but wonder if an approach like this is likely to generate more confusion than enlightenment.

If any of you monks have experience teaching Perl to others, how would you approach something like that? map is a beautiful bastion of functional programming in the procedural world of Perl, but I suspect that taking the time to outline a procedural approach would have been better. It certainly would have been easier to understand. Would you strive to present a simplistic, but easy to understand solution, or one that more accurately reflects Perl's capabilities? I've been having a heck of a time getting across some of Perl's strengths to one of my coworkers and I need to start considering some different tactics. Should I dumb down Perl?

Oh, and feel free to clean up the code above, if you think my solution is overkill.

Update: Hmm... I guess this also goes back to my worries about maintainable code. What happens if I do something complicated that others are not likely to understand, but seems like the best solution. Should I code for clarity? We've been interviewing some Perl programmers who are totally clueless ("strict just gets in the way", "local and my are the same thing", etc.) and if this is the norm, "baby Perl" may be appropriate. I'm beginning to despair of finding anyone worth hiring (who we can afford, that is :)

Come to think of it, I might be one of those 'clueless' programmers. I definitely have some production code I don't want anyone to see :( And yes, I'm dead serious about my possibly being one of those 'clueless' programmers (this is enough for another meditation). I know how to program Perl, but my knowledge of putting complex systems together is spotty at best. I could use a mentor in that regard, but no one else I work with is qualified.

Cheers,
Ovid

Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Replies are listed 'Best First'.
Re: Helping Beginners (continued)
by japhy (Canon) on Oct 30, 2001 at 07:54 UTC
    Ok. Here's how I present any ST-involving response. Notice how explicit I make the functions to begin with, and how I then change the code to a more idiomatic style. I'll take the given request, and for example's sake, I'll assume the data was:
    @data = ( 'buy 1/23/01 100 12.625 50 25.25', 'buy 09/1/01 100 12.625 50 25.25', 'buy 10/23/01 100 12.625 50 25.25', 'buy 10/25/01 100 12.625 50 25.25', );
    The first thing you need to do to be able to sort the data is isolate the field you want to sort by extracting it from your data:
    sub get_date { my $string = shift; my $date = (split ' ', $string)[1]; # second field return $date;
    Now we can extract the dates from each line of data:
    for $line (@data) { push @dates, get_date($line); }
    We now have two parallel arrays, @data which holds the original data, and @dates which holds the date for each line, respectively. Now we need to sort @dates to get it in the correct order. The problem is that sorting ONE array will not help -- both arrays need to be sorted. We could sort the indices of one array, but we'll use a different approach, one that involves references. Instead of keeping track of the dates only, let's also include the other information as well, as elements of an anonymous array:
    for $line (@data) { push @dates, [ get_date($line), $line ]; }
    For each element $e in the @dates array, $e->[0] is the date, and $e->[1] is the original line. Now we can move on to the actual problem of sorting the array. We need to sort the dates. What's the best format to sort dates in? Seconds? Well, maybe. But we don't have or need that granularity -- we have year, month, and day. Instead of using the form "DD/MM/YY", let's use the form "YYMMDD". This will be of great use to us, because dates in the latter form can be sorted as regular numbers. So we need to change our get_date() function a bit, to extract the date and fix it:
    sub get_date { my $string = shift; my $date = (split ' ', $string)[1]; my ($d, $m, $y) = split '/', $date; return sprintf "%02d%02d%02d", $y, $m, $d; }
    Now our function returns "YYMMDD", with each number zero-padded (that's what the "%02d" format means). Now we can sort the dates natively:
    @dates = sort { $a->[0] <=> $b->[0] } @dates;
    Before you panic, remember that the elements of @dates are array references, so $a->[0] is accessing the date portion of the element. If you've never used sort() before, $a and $b are the two elements being compared, and the <=> operator returns a value of -1, 0, or 1, depending on the relationship (less than, equal to, or greater than) the two operands. Now, our last job is to extract the original data from the array. For this, we will use map(), which acts like a for-loop on a list.
    @data = map $_->[1], @dates;
    This extracts the second element from each array reference, and stores them in @data. Now we have working code. But let's make it more idiomatic. First, notice that we have three distinct stages in our code:
    1. date extraction
    2. sorting
    3. data restoration
    We do these one after the other, so we can try to combine them into one larger process:
    @data = restore( sort { $a->[0] <=> $b->[0] } extract(@data) );
    Notice how the stages now read from the bottom up? This is the standard appearance of Lisp-like code (and this code is indeed Lisp-like). Instead of creating two more functions, restore() and extract(), let's see what we can do with the existing function get_date(), and Perl's built-in map() function:
    # extract(@data) # becomes map [ get_date($_), $_ ], @data
    Notice how the extraction (which involves the creation of the array of array references) is really just an iteration of the get_date() function on each element of the array? Then, for restore(), we simply do:
    # restore(...) # becomes map $_->[1], ...
    Our code now looks like this:
    @data = map $_->[1], sort { $a->[0] <=> $b->[0] } map [ get_date($_), $_ ], @data;
    Lisp-ish code, indeed! (Who ever said Perl wasn't functional?) What you've just witnessed the creation of is called a Schwartzian Transform (find its history elsewhere on the internet). It takes the form:
    @data = map { restore($_) } sort { ... } map { extract($_) } @data;
    which is (more or less) what our code now looks like. (Insert documentation references and what-not here.)

    _____________________________________________________
    Jeff[japhy]Pinyan: Perl, regex, and perl hacker.
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

      japhy, that was great. Not only was it a beatiful break down of how you accomplished your end goal, but I now understand why my version was incredibly sloppy next to yours. I think that's a mistake I won't make again :)

      Cheers,
      Ovid

      Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

      Great explanation, japhy! Between this node and actually having a good reason to write a Schwartzian Transform, I now "grasp" what it means. I haven't felt this clever since I worked through Duff's device. Domo arigato!

      --
      :wq
Re: Helping Beginners
by blakem (Monsignor) on Oct 30, 2001 at 06:03 UTC
    The funny thing about this meditation is how closely it parallels the origins of the Schwartzian Transform itself. If I remember my perl lore correctly.....

    A beginner posted to c.l.p asking how to sort a file based on the last field in each line. merlyn responded with a brilliant but unexplained snippet that did exactly what was asked for. Tom Christiansen decided that this wasn't particularly helpful, since the newbie couldn't make heads or tails of it (nor could many of the more experienced coders, since it hadn't really been seen before). His response has been immortalized as the FMTEYEWTK about sort article. Rumor has it that this exchange is what lead to the name 'Schwartzian Transform' and that scary looking snippet has become a perl idiom.

    Sorry to sidestep your questions... just thought I'd toss a little history in there. If nothing else, you should send the newbie a link to TC's explanation....

    -Blake

Re: Helping Beginners
by japhy (Canon) on Oct 30, 2001 at 05:41 UTC
    I usually give help in stages. I break the problem down into smaller parts, and then show how Perl integrates those smaller parts into a bigger, yet still smooth, larger operation. That's what a ST is, anyway -- three (or so) small parts joined into a big one. It's best to explain them starting from scratch. More later, I'm off to a psychology experiment.

    _____________________________________________________
    Jeff[japhy]Pinyan: Perl, regex, and perl hacker.
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

      That's fine when you're in a teaching environment.

      The problem is when you're helping someone to find a solution quickly to a concrete problem you're faced with the alternatives:

      1. Explain a correct, efficient solution. Watch the eyes glaze over as soon as you get into anything complex.
      2. Give the correct, efficient solution and just say it works... great way to foster cargo-cult.
      3. Explain in simple chunks building up to the complete picture, and when you're half way through your victim will tell you "Don't have time for all that now, just tell me how to do it". Then go and beat your head against the nearest brick wall (concrete is good too)
      4. Give a simple solution, that's correct and safe, if not optimal. But then the victim beginner doesn't learn as much.
      In an in-house environment you can send beginners to courses and give them assignments that will stretch them progressively. In a help mail-list that just isn't possible. You just have to excite a beginner's curiosity while holding their interest, and of course the balance is different for every individual.

      I wonder if the best response is to give/explain the 'correct, optimal' (for some definition of 'correct' or 'optimal'), together with the simple safe long solution, and hope that the comparison will whet the beginner's appetite. And that takes more time than most people have available :(

Re: Helping Beginners
by toma (Vicar) on Oct 30, 2001 at 12:50 UTC
    I have tried several approaches to the baby-perl versus no-holds-barred-perl problem. I think the best approach is to write code that is an achievable challenge for your audience to understand. It should be difficult enough that the reader feels a sense of accomplishment in reading it. It should not be so difficult that the reader does not stand a chance.

    More difficult code can be buried within a module. A module is an extension of the language itself. If properly written, it can be used without having to understand how it works. The audience is simply told, "Use this module and don't worry."

    Japhy writes a clear explanation sufficient to guide a novice through the advanced code in the example. His explanation proves to us why we should all buy Japhy's book, should he decide to write it! Not all of us have the time or talent to write such a clear explanation. We should tame our code until it is barely within the reach of our audience. Japhy tames his code by showing how it is developed.

    I have the pleasure of working with a fellow who doesn't know perl at all, but he is confident that he will be able to follow, use, and modify my code. He has been reading my copy of Learning Perl. His willingness to stretch his capabilities makes him more valuable and more fun to work with. He knows he can ask me questions if he runs into something opaque. He doesn't want to suppress my creativity with his temporary personal limitations. Between his confidence and my somewhat conservative coding style, we enjoy the collaboration.

    It should work perfectly the first time! - toma

Re: Helping Beginners
by bluto (Curate) on Oct 30, 2001 at 06:03 UTC
    When I first saw the Schwartzian transform it took me a while to just decipher it (ok, I'm getting old). I was writing similar code (i.e. calculating sort fields once rather than during each compare), but obviously I was using a lot of intermediate arrays & hashes. It probably ran almost as fast (and certainly fast enough for me). I didn't learn how to sort with the transform, nor how to optimize, but rather how to unhinge my brain from writing in C and think of the problem totally differently, in perl.

    If I were teaching sorting to a beginner, I'd teach them how to write a compare function. The first one would probably just (inefficiently) parse each line each time it was compared. After that I'd piecemeal a transform using intermediate arrays. Then if they could grok that I'd finally present the transform in it's simplest form (i.e. each element would look somethine like

    [ $sortable_date, $original_string ]
    In certain limited forums I think "baby Perl" is ok. Books, tutorials, college classes are all appropriate for this. Past that they should be expected to use the available resources (or be pointed to them). It just amazes me that folks actually interview for jobs that they would be, not just slightly, but totally incompetent in. These people just don't have the "right stuff". The "right stuff" here isn't knowing perl or being a programming guru. It's the drive that pushes you to learn about it, and the mental ability to actually be able to apply some of it from time to time.

    bluto

Re (tilly) 1: Helping Beginners
by tilly (Archbishop) on Oct 30, 2001 at 06:46 UTC
    My personal rule of thumb is that an answer is not good enough if I don't think that the recipient will get enough context to understand where it comes from conceptually. In that spirit, my usual way of explaining map is to draw an analogy with a Unix pipeline, since I think that pipelines are easier for people to grasp.

    For instance look at RE (tilly) 1: Schwartzian Transform.

Re: Helping Beginners
by DamnDirtyApe (Curate) on Oct 30, 2001 at 10:57 UTC

    When I first took an interest in Perl, this site was what convinced me it was worth pursuing. When someone would post a particularily interesting problem, I was always amazed to see eight or ten completely different implementations. `Dumbing down Perl' will certainly improve the novice comprehension, but please, don't stop giving the really clever solutions as well. TIMTOWTDI is an important theme around this place, and should be taught to beginners along with the code.


    _______________
    D a m n D i r t y A p e
    Home Node | Email

Re: Helping Beginners
by 2501 (Pilgrim) on Oct 30, 2001 at 06:43 UTC
    How to help someone is often defined by what they need it for. If I was going to help a fellow programmer, I would go into more detail and relate it to common programming theory rather then teaching from the ground up.
    If I am teaching a beginner who is not a programmer, nor do they have the desire, love, or time to learn perlt hen sometimes I cheat and teach them what they need to know with a very blackbox approach. More cause & effect then how & why.
    I have also found it is sometimes a little harder to teach programmers who are glued to C++. Sometimes it takes abit to understand that perl's TIMTOWTDI can sometimes be an asset over the structure of C++.
Re: Helping Beginners
by social_mandog (Sexton) on Oct 30, 2001 at 07:21 UTC
    I remember when I wrote stuff like  if(booleanVar==true){} and throught anyone who did differently was just showing off. Now I know better

    Right now, the Schwartzian transform looks like line noise to me. Stuff like $_->[0] and  @{$_->[1]} is particularly hard to follow. I know that I need to allocate a few hours to figuring this out because it is an idiom that seems to come up a lot in (apparently) effective circles.

    I guess I'm ok with tricky constructs as long as they make things clearer once you understand them.

Re(demerphq): Helping Beginners
by demerphq (Chancellor) on Nov 01, 2001 at 07:12 UTC
    Well, I've seen a lot of excellent replys, much better than anything I could post in terms of teaching, but a few comments. Part of the issue is the person you are dealing with. If they are going to have a hard time with the idea of map then an ST or GRT is not going to be an easy thing to describe. OTOH if the person is receptive to an idea of map then the idea of a transform, sort, transform-back isn't going to be so hard.

    Whatever the level of the person, I've found that a lot of the time two or three solutions can be the best. They'll pick the one they are most comfortable with, but at the same time (*hopefully*) be intrigued by the other possibilites. Then you can forward them on to the appropriate documentation and let them play.

    The other reason I posted was because I couldnt see why you are doing four steps, instead of three or even what I prefer two. I played around with this for a bit and came up with three variations, all simpler (at least to me).

    My first solution was a straight transformation of your two stage prepare with a one stage prepare, and I cheated and lost the call to trim, using m// in list context.

    @data = map { $_->[0] } sort { $a->[3] <=> $b->[3] # YY ||$a->[1] <=> $b->[1] # MM ||$a->[2] <=> $b->[2]} # DD map { [ m!^\s*(\D*(\d+)/(\d+)/(\d+)(?: [\d.]+)*)\s*$! ] } @idata; # 0 1MM 2DD 3YY
    My next thought was that the date format sucked, and that maybe the sort logic could be simplified in one go, also that I probably would end up splitting it at some point so I might as well return a list of the parts. A bit more complex regex might also be nice.
    @data = sort { $a->[1] cmp $b->[1] } # Sort by YYYY/MM/DD map { my @p=m!^\s*([A-Za-z]+)\s+ # alpha word (\d+)/(\d+)/(\d+) # date MM DD YY (?:\s+([\d.]+)) # Substitute Number regex he +re (?:\s+([\d.]+)) # .. (?:\s+([\d.]+)) # .. (?:\s+([\d.]+))!x; # Comments please # Fix the date if this is still in use in 2050... splice @p,1,3,sprintf("%04d/%02d/%02d", ($p[3]>50 ? $p[3]+1900 : $p[3]+2000), @p[1,2]); # it deserves to produce incorrect results, after all # 2 digit dates is madness \@p} # return the fixed array @idata;
    But then I decide that I might not want to do that, and I might want it as fast as possible. In which case I wouldn't use an ST but a GRT
    @data = map {substr($_,3)} sort #lexicographical representation of the date map { m!^\s*(\D*(\d+)/(\d+)/(\d+)(?: [\d.]+)*)\s*$! && pack ("CCCA*",$4,$2,$3,$1)} @idata;
    The point being that these are the kind of ideas that I would probably show an interested colleague if I was asked.

    Anyway Ovid thanks for the thought, and for provoking the thoughts you did, (japhy++), I had a good time with this one.

    BTW: Im too tired now, but tomorrow I'll update this space with a link to the excellent article on sorting and the Guttman Rosler Transform (do a Super Search until then :)

    Yves / DeMerphq
    --
    Have you registered your Name Space?