Re: Helping Beginners (continued)
by japhy (Canon) on Oct 30, 2001 at 07:54 UTC
|
Ok. Here's how I present any ST-involving response. Notice how explicit I make the functions to begin with, and how I then change the code to a more idiomatic style. I'll take the given request, and for example's sake, I'll assume the data was:
@data = (
'buy 1/23/01 100 12.625 50 25.25',
'buy 09/1/01 100 12.625 50 25.25',
'buy 10/23/01 100 12.625 50 25.25',
'buy 10/25/01 100 12.625 50 25.25',
);
The first thing you need to do to be able to sort the data is isolate the field you want to sort by extracting it from your data:
sub get_date {
my $string = shift;
my $date = (split ' ', $string)[1]; # second field
return $date;
Now we can extract the dates from each line of data:
for $line (@data) {
push @dates, get_date($line);
}
We now have two parallel arrays, @data which holds the original data, and @dates which holds the date for each line, respectively. Now we need to sort @dates to get it in the correct order. The problem is that sorting ONE array will not help -- both arrays need to be sorted. We could sort the indices of one array, but we'll use a different approach, one that involves references. Instead of keeping track of the dates only, let's also include the other information as well, as elements of an anonymous array:
for $line (@data) {
push @dates, [ get_date($line), $line ];
}
For each element $e in the @dates array, $e->[0] is the date, and $e->[1] is the original line. Now we can move on to the actual problem of sorting the array. We need to sort the dates. What's the best format to sort dates in? Seconds? Well, maybe. But we don't have or need that granularity -- we have year, month, and day. Instead of using the form "DD/MM/YY", let's use the form "YYMMDD". This will be of great use to us, because dates in the latter form can be sorted as regular numbers. So we need to change our get_date() function a bit, to extract the date and fix it:
sub get_date {
my $string = shift;
my $date = (split ' ', $string)[1];
my ($d, $m, $y) = split '/', $date;
return sprintf "%02d%02d%02d", $y, $m, $d;
}
Now our function returns "YYMMDD", with each number zero-padded (that's what the "%02d" format means). Now we can sort the dates natively:
@dates = sort { $a->[0] <=> $b->[0] } @dates;
Before you panic, remember that the elements of @dates are array references, so $a->[0] is accessing the date portion of the element. If you've never used sort() before, $a and $b are the two elements being compared, and the <=> operator returns a value of -1, 0, or 1, depending on the relationship (less than, equal to, or greater than) the two operands. Now, our last job is to extract the original data from the array. For this, we will use map(), which acts like a for-loop on a list.
@data = map $_->[1], @dates;
This extracts the second element from each array reference, and stores them in @data. Now we have working code. But let's make it more idiomatic. First, notice that we have three distinct stages in our code:
- date extraction
- sorting
- data restoration
We do these one after the other, so we can try to combine them into one larger process:
@data =
restore(
sort { $a->[0] <=> $b->[0] }
extract(@data)
);
Notice how the stages now read from the bottom up? This is the standard appearance of Lisp-like code (and this code is indeed Lisp-like). Instead of creating two more functions, restore() and extract(), let's see what we can do with the existing function get_date(), and Perl's built-in map() function:
# extract(@data)
# becomes
map [ get_date($_), $_ ], @data
Notice how the extraction (which involves the creation of the array of array references) is really just an iteration of the get_date() function on each element of the array? Then, for restore(), we simply do:
# restore(...)
# becomes
map $_->[1], ...
Our code now looks like this:
@data =
map $_->[1],
sort { $a->[0] <=> $b->[0] }
map [ get_date($_), $_ ],
@data;
Lisp-ish code, indeed! (Who ever said Perl wasn't functional?) What you've just witnessed the creation of is called a Schwartzian Transform (find its history elsewhere on the internet). It takes the form:
@data =
map { restore($_) }
sort { ... }
map { extract($_) }
@data;
which is (more or less) what our code now looks like. (Insert documentation references and what-not here.)
_____________________________________________________
Jeff[japhy]Pinyan:
Perl,
regex,
and perl
hacker.
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??; | [reply] [d/l] [select] |
|
japhy, that was great. Not only was it a beatiful break down of how you accomplished your end goal, but I now understand why my version was incredibly sloppy next to yours. I think that's a mistake I won't make again :)
Cheers,
Ovid
Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.
| [reply] |
|
Great explanation, japhy! Between this node
and actually having a good reason to write a Schwartzian
Transform, I now "grasp" what it means. I haven't felt
this clever since I worked through Duff's device.
Domo arigato!
--
:wq
| [reply] |
Re: Helping Beginners
by blakem (Monsignor) on Oct 30, 2001 at 06:03 UTC
|
The funny thing about this meditation is how closely it parallels the origins of the Schwartzian Transform itself. If I remember my perl lore correctly.....
A beginner posted to c.l.p asking how to sort a
file based on the last field in each line. merlyn responded with a brilliant but unexplained snippet that
did exactly what was asked for. Tom Christiansen decided that this wasn't particularly helpful, since the newbie couldn't make heads or tails of it (nor could many of
the more experienced coders, since it hadn't really been seen before). His response has been immortalized as the FMTEYEWTK about sort article. Rumor has it that this exchange is what lead to the name 'Schwartzian Transform' and that scary looking snippet has become a perl idiom.
Sorry to sidestep your questions... just thought I'd toss a little history in there. If nothing else, you should send the newbie a link to TC's explanation....
-Blake
| [reply] |
Re: Helping Beginners
by japhy (Canon) on Oct 30, 2001 at 05:41 UTC
|
I usually give help in stages. I break the problem down into smaller parts, and then show how Perl integrates those smaller parts into a bigger, yet still smooth, larger operation. That's what a ST is, anyway -- three (or so) small parts joined into a big one. It's best to explain them starting from scratch. More later, I'm off to a psychology experiment.
_____________________________________________________
Jeff[japhy]Pinyan:
Perl,
regex,
and perl
hacker.
s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??; | [reply] |
|
That's fine when you're in a teaching environment.
The problem is when you're helping someone to find a solution quickly to a concrete problem you're faced with the alternatives:
- Explain a correct, efficient solution. Watch the eyes glaze over as soon as you get into anything complex.
- Give the correct, efficient solution and just say it works... great way to foster cargo-cult.
- Explain in simple chunks building up to the complete picture, and when you're half way through your victim will tell you "Don't have time for all that now, just tell me how to do it". Then go and beat your head against the nearest brick wall (concrete is good too)
- Give a simple solution, that's correct and safe, if not optimal. But then the
victim beginner doesn't learn as much.
In an in-house environment you can send beginners to
courses and give them assignments that will stretch them progressively. In a help mail-list that just isn't possible. You just have to excite a beginner's curiosity while holding their interest, and of course the balance is different for every individual.
I wonder if the best response is to give/explain the 'correct, optimal' (for some definition of 'correct' or 'optimal'), together with the simple safe long solution, and hope that the comparison will whet the beginner's appetite. And that takes more time than most people have available :(
| [reply] |
Re: Helping Beginners
by toma (Vicar) on Oct 30, 2001 at 12:50 UTC
|
I have tried several approaches to the baby-perl versus
no-holds-barred-perl problem. I think the best approach
is to write code that is an achievable challenge for your
audience to understand.
It should be difficult enough that the reader
feels a sense of accomplishment in reading it.
It should not be so difficult that the reader does not
stand a chance.
More difficult code can be buried within a module.
A module is an extension of the language itself.
If properly written, it can be used without having
to understand how it works. The audience is simply
told, "Use this module and don't worry."
Japhy writes a clear explanation sufficient to guide
a novice through the advanced code in the example.
His explanation proves to us why
we should all buy Japhy's book,
should he decide to write it!
Not all of us have the time or talent
to write such a clear explanation.
We should tame our code until it is
barely within the reach of our audience.
Japhy tames his code by showing how it
is developed.
I have the pleasure of working
with a fellow who doesn't know perl at all,
but he is confident that he will be able
to follow, use, and modify my code.
He has been reading my copy of
Learning Perl.
His willingness to stretch his capabilities
makes him more valuable and more fun to work with.
He knows he can ask me questions
if he runs into something opaque.
He doesn't want to suppress my creativity
with his temporary personal limitations.
Between his confidence and my somewhat conservative
coding style, we enjoy the collaboration.
It should work perfectly the first time! - toma | [reply] |
Re: Helping Beginners
by bluto (Curate) on Oct 30, 2001 at 06:03 UTC
|
When I first saw the Schwartzian transform it took me
a while to just decipher it (ok, I'm getting old). I
was writing similar code (i.e. calculating sort fields
once rather than during each compare), but obviously
I was using a lot of intermediate arrays & hashes. It
probably ran almost as fast (and certainly fast enough
for me). I didn't learn how to sort with the transform,
nor how to optimize, but rather how to unhinge my brain
from writing in C and think of the problem totally
differently, in perl.
If I were teaching sorting to a beginner, I'd teach them
how to write a compare function. The first one would
probably just (inefficiently) parse each line each time
it was compared. After that I'd piecemeal a transform
using intermediate arrays. Then if they could grok
that I'd finally present the
transform in it's simplest form (i.e. each element
would look somethine like
[ $sortable_date, $original_string ]
In certain limited forums I think "baby Perl" is ok. Books,
tutorials, college classes are all appropriate for this.
Past that they should be expected to use the available
resources (or be pointed to them). It
just amazes me that folks actually interview for jobs that
they would be, not just slightly, but totally incompetent in.
These people just don't have the "right stuff". The
"right stuff" here isn't knowing perl or being a programming
guru. It's the drive that pushes you to learn about it,
and the mental ability to actually be able to apply some
of it from time to time.
bluto
| [reply] [d/l] |
Re (tilly) 1: Helping Beginners
by tilly (Archbishop) on Oct 30, 2001 at 06:46 UTC
|
| [reply] |
Re: Helping Beginners
by DamnDirtyApe (Curate) on Oct 30, 2001 at 10:57 UTC
|
When I first took an interest in Perl, this site was what
convinced me it was worth pursuing. When someone would
post a particularily interesting problem, I was always
amazed to see eight or ten completely different
implementations. `Dumbing down Perl' will certainly
improve the novice comprehension, but please, don't stop
giving the really clever solutions as well. TIMTOWTDI
is an important theme around this place, and should be
taught to beginners along with the code.
_______________
D
a
m
n
D
i
r
t
y
A
p
e
Home Node
|
Email
| [reply] |
Re: Helping Beginners
by 2501 (Pilgrim) on Oct 30, 2001 at 06:43 UTC
|
How to help someone is often defined by what they need it for. If I was going to help a fellow programmer, I would go into more detail and relate it to common programming theory rather then teaching from the ground up. If I am teaching a beginner who is not a programmer, nor do they have the desire, love, or time to learn perlt hen sometimes I cheat and teach them what they need to know with a very blackbox approach. More cause & effect then how & why.
I have also found it is sometimes a little harder to teach programmers who are glued to C++. Sometimes it takes abit to understand that perl's TIMTOWTDI can sometimes be an asset over the structure of C++.
| [reply] |
Re: Helping Beginners
by social_mandog (Sexton) on Oct 30, 2001 at 07:21 UTC
|
I remember when I wrote stuff like if(booleanVar==true){} and throught anyone who did differently was just showing off. Now I know better
Right now, the Schwartzian transform looks like line noise to me. Stuff like $_->[0] and @{$_->[1]} is particularly hard to follow. I know that I need to allocate a few hours to figuring this out because it is an idiom that seems to come up a lot in (apparently) effective circles.
I guess I'm ok with tricky constructs as long as they make things clearer once you understand them. | [reply] [d/l] [select] |
Re(demerphq): Helping Beginners
by demerphq (Chancellor) on Nov 01, 2001 at 07:12 UTC
|
Well, I've seen a lot of excellent replys, much better than anything I could post in terms of teaching, but a few comments. Part of the issue is the person you are dealing with. If they are going to have a hard time with the idea of map then an ST or GRT is not going to be an easy thing to describe. OTOH if the person is receptive to an idea of map then the idea of a transform, sort, transform-back isn't going to be so hard.
Whatever the level of the person, I've found that a lot of the time two or three solutions can be the best. They'll pick the one they are most comfortable with, but at the same time (*hopefully*) be intrigued by the other possibilites. Then you can forward them on to the appropriate documentation and let them play.
The other reason I posted was because I couldnt see why you are doing four steps, instead of three or even what I prefer two. I played around with this for a bit and came up with three variations, all simpler (at least to me). My first solution was a straight transformation of your two stage prepare with a one stage prepare, and I cheated and lost the call to trim, using m// in list context.
@data = map { $_->[0] }
sort { $a->[3] <=> $b->[3] # YY
||$a->[1] <=> $b->[1] # MM
||$a->[2] <=> $b->[2]} # DD
map { [ m!^\s*(\D*(\d+)/(\d+)/(\d+)(?: [\d.]+)*)\s*$! ] }
@idata; # 0 1MM 2DD 3YY
My next thought was that the date format sucked, and that maybe the sort logic could be simplified in one go, also that I probably would end up splitting it at some point so I might as well return a list of the parts. A bit more complex regex might also be nice.
@data = sort { $a->[1] cmp $b->[1] } # Sort by YYYY/MM/DD
map { my @p=m!^\s*([A-Za-z]+)\s+ # alpha word
(\d+)/(\d+)/(\d+) # date MM DD YY
(?:\s+([\d.]+)) # Substitute Number regex he
+re
(?:\s+([\d.]+)) # ..
(?:\s+([\d.]+)) # ..
(?:\s+([\d.]+))!x; # Comments please
# Fix the date if this is still in use in 2050...
splice @p,1,3,sprintf("%04d/%02d/%02d",
($p[3]>50 ? $p[3]+1900 : $p[3]+2000), @p[1,2]);
# it deserves to produce incorrect results, after all
# 2 digit dates is madness
\@p} # return the fixed array
@idata;
But then I decide that I might not want to do that, and I might want it as fast as possible. In which case I wouldn't use an ST but a GRT
@data = map {substr($_,3)}
sort #lexicographical representation of the date
map { m!^\s*(\D*(\d+)/(\d+)/(\d+)(?: [\d.]+)*)\s*$!
&& pack ("CCCA*",$4,$2,$3,$1)}
@idata;
The point being that these are the kind of ideas that I would probably show an interested colleague if I was asked.
Anyway Ovid thanks for the thought, and for provoking the thoughts you did, (japhy++), I had a good time with this one.
BTW: Im too tired now, but tomorrow I'll update this space with a link to the excellent article on sorting and the Guttman Rosler Transform (do a Super Search until then :)
Yves / DeMerphq
--
Have you registered your Name Space? | [reply] [d/l] [select] |