Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Things I have learnt

by neilh (Pilgrim)
on Oct 22, 2004 at 02:40 UTC ( #401357=perlmeditation: print w/ replies, xml ) Need Help??

I have been coding in Perl for about 7 years now. However, I am not a programmer, so I always thought I was pretty good.

Once I was actually employed to write Perl, and spent three months developing a replacement for Crystal Reports using GD::Graph and CGI. The package eventually went on to make the company a lot of money as part of a larger product, but I had already left by that stage, so didn't reap the rewards.

Prior to this year, I was employed as a DBA, and rather than writing SQL, I wrote everything using the most excellent DBI and DBD::Oracle. I left behind a lot of code for my replacement, and I heard the other day that he had rewritten eveything in ksh and deleted my work.

As some of you know, for the last few months I have been working on a programme to synchronize the various directory servers we have here - Active Directory with Exchange, Sun One Directory Server, Oracle Internet Directory and the payroll system - using, among other things: Net::LDAP. This is the project that led me to the Monastery 12 weeks ago today.
About two weeks ago, I finally finished phase one of this project and am now taking a much anticipated break from LDAP.

The first thing I was told was that few people would help me until I learnt to use strict;. I took this advice to heart, and indeed within a few days I wrote everything using strictures.
From there the whole world of lexical scope was opened to me.
Within one week of joining the Monastery I started using subroutines. I was pretty proud of myself. I looked over the code that I had written and realised that a lot of it was repetitous, so I created a package of my own for all the stuff that I kept using, and just called it with use lib '/path/to/new-package';.

The best package that I have learnt to use is Tie::File. This is my all time favourite. In the old days (ie about 4 weeks ago) in order to increment a sequence in a file, I used to do something like:

open (FILE, "<file.txt"); my @file=<FILE>; close FILE; my $file[0]=$file[0] + 1; open (FILE, ">file.txt"); print FILE "$file[0]\n"; close FILE;
Now, of course, I do the much better:

tie my @file, 'Tie::File', 'file.txt'; $file[0]=$file[0] + 1; untie @file;
As I look over my directory where the last three month's work sits, I see that there are 5000 lines in the production area and about another 10500 lines in my test area. For me, this is a lot of work. In three years at my last position I produced less than 1000 lines of code total.

Where should I go from here? Personally, I'd love to be able to understand the Schwartzian Transform, but I think I have a long way to go before that becomes clear.

My thanks to you all for helping me on my journey to Perl enlightenment.


Comment on Things I have learnt
Select or Download Code
Replies are listed 'Best First'.
Re: Things I have learnt
by davido (Archbishop) on Oct 22, 2004 at 03:40 UTC

    At the risk of providing an inferior answer amid higher authorities, I'll try to explain the Schwartzian Transform.

    First, one has to understand why it is useful. Assume you want to sort a list by, for example, ignoring the first character in each element. In other words, you want to sort 'Babc' before "Abcd". You could write such a sort like this:

    my @sorted = sort { substr( $a, 1 ) cmp substr( $b, 1 ) } @unsorted;

    And that works fine. But remember that the comparison routine of a sort function gets called many times. ...possibly several times per element being sorted. It's somewhat inefficient calling substr twice for each comparison, over and over and over again as the list gets sorted.

    There is a better way. What if, instead of calling substr over and over again, multiple times per element being sorted, you called it once per element and stored the results? If you can create, store, and access this cached list fast enough, it will be more efficient than doing complex calculations over and over again on the same element throughout the course of the sort.

    The Schwartzian Transform does this. It creates a cache of whatever complex computation you had in mind, and saves it alongside the original data, so that it's easy to keep track of the real data and its cached pre-computed sort criteria at the same time.

    The most common form of the Schwartzian Transform takes a flat list, and turns it into a list of references to two-element arrays. Here's how our substr( $string, 1 ) might work in a ST.

    @original | @transform | [0] [1] _________ | _________________________ abc | abc bc asdf | asdf sdf jkas | jkas kas qwert | qwert wert uiop | uiop iop

    So in the above example:

    @original = qw/abc asdf jkas qwert uiop/;


    @transformed = ( [ 'abc' , 'bc' ], [ 'asdf' , 'sdf' ], [ 'jkas' , 'kas' ], [ 'qwert', 'wert' ], [ 'uiop' , 'iop' ] );

    Now lets look at the complete transform, as commonly seen:

    my @sorted = map { $_->[0] } sort { $a->[1] cmp $b->[1] } map { [ $_, substr( $_, 1 ) ] } @unsorted;

    Look at this from the bottom up:

    1. Start with a list called @unsorted
    2. Feed @unsorted into map. The bottom map takes each element of @unsorted and turns it into an anonymous array with two elements. The first is the original string, and the second is the cached complex calculation.
    3. Feed the list of array-refs to sort. Inside sort, dereference each array-ref to compare the 2nd elements (the cached versions).
    4. Finally, feed this list of anon-array-refs (now sorted into the final order) into the top map. This final step strips away the cached information, leaving only the original strings (the array-refs get thrown away too).
    5. The sorted original strings get assigned to @sorted

    In actuality, this particular example may not be all that more efficient, since substr is pretty fast, and anonymous array creation and dereferencing takes time. But for longer lists, and particularly for more complex sort computations, the ST shines.

    I hope this helps. :)


Re: Things I have learnt
by elusion (Curate) on Oct 22, 2004 at 03:54 UTC
Re: Things I have learnt
by pg (Canon) on Oct 22, 2004 at 03:05 UTC
    "I heard the other day that he had rewritten eveything in ksh and deleted my work."

    Quite a big surprise to me. I work in a place that Java and c are dominant, and they also have lots of ksh scripts. For every project I can decide the technical direction, I always require scripts to be done in Perl instead of ksh, and this is how Perl got introduced to the company, and slowly became popular. I believe that's the right direction. To me, Perl is a kind of programming language, but ksh is only a traditional scripting language that has very limited functionality.

    Even under this direction, I will never ask anyone to rewrite exsiting ksh script in Perl, as that's a waste of time. We only use Perl for new scripts. So delete your stuff, and do a total rewrite sounds very stupid to me. Even if there might be certain things can be improved in your scripts, they should just fix it in Perl.

    "Where should I go from here?"

    Nobody else can judge better than you yourself, as only you know yourself and the place you work. Instead of learning for learning, why not just try to do more things in Perl, that's the best way to learn.

    Also with the Perl knowledge you learned so far, come to this site more often, try to help people out, just as others helped you. Answering questions is also a good way to learn. Even give a bad answer and get corrected is a good way to learn.


    Thought about it more, I actually have a question: when you said that they rewrote your Perl scripts (DBI) in ksh, what eactly did they do? (1) execute queries through something like SQL*PLUS (2) or rewrite the program in something like Java, and execute through ksh scripts? or ...

      As far as I can work out, the wrappers were rewritten in ksh using calls to SqlPlus as you so sumised.


        They are heading the wrong way, don't worry about it, you cannot worry about other's mistake too much.

Re: Things I have learnt
by leriksen (Curate) on Oct 22, 2004 at 05:01 UTC
    OK, I too will bite.

    The value of the S::T is that you do all your potentially expensive functions calls once per value, rather than many times.

    Normally when you sort one list into another, the normal perl idioms go like this

    my @output = sort @input; # default sort using eq to compare values my @output = sort {$a <=> $b} @input; # numeric sort my @output = sort some_func @input; # custom algorithm

    Now lets have look at that last one - how often does some_func get called? Heres some code to show a potential worst case scenario

    #!/usr/bin/perl -w use strict; my @input = (1..6); my $calls; my $fibcalls; my @output = sort reorder @input; # how many calls to fib() ? sub reorder { $calls++; return fib($a) <=> fib($b); } sub fib { my $val = shift; $fibcalls++; $val == 0 ? return $val: $val == 1 ? return $val : return fib($val - 1) + fib($val - 2); } print "called $calls times - fib calls $fibcalls\n";

    Output is

    cd /home/le6303/src/perl/ /usr/bin/perl -w "/home/le6303/src/perl/" called 9 times - fib calls 150

    OK, so a fibonacci sequence is exagerating the impact, but even so, what if some_func was doing some DBI-based lookup, using LWP or something else potential slow or expensive ?

    Wouldn't it be better that we call some_func once for each unique value ?

    Well we can. Looking at what we have so far, sort takes an input list (and potentially a function to operate on elements of that list) and returns a list. Wouldn't it be better if, instead of sort calling some_func repeated, often for a value it has already seen, we gave sort a list of values with some_func already applied ?

    What we need is a function that takes a list, applies a function to each element once and once only, and returns a list of those function applications.

    That function is map

    We can write a trivial implementation of map thus

    code>sub my_map { my ($function, @values) = @_; my @results; foreach my $value (@values) { push @results, $function->($value); } return @results; } </code>

    Putting that together, we can show my_map applied to a list thus

    #!/usr/bin/perl -w use strict; my @input = (1..6); my @output = my_map(\&double, @input); sub double { return 2 * shift; } sub my_map { my ($function, @values) = @_; my @results; foreach my $value (@values) { push @results, $function->($value); } return @results; } use Data::Dumper; print Dumper \@output;;

    Results are

    cd /home/le6303/src/perl/ /usr/bin/perl -w "/home/le6303/src/perl/" $VAR1 = [ 2, 4, 6, 8, 10, 12 ];

    So to proceed, we want to pass the list of values that map has already processed to sort, and have sort compare those.

    Like this

    my @output = sort {$a <=> $b} map {fib($_)} @input;

    But we have a problem now - @output is the sorted list of values that have had fib() applied to them, not the original list. If we put this together with our original example, we can see it more clearly

    #!/usr/bin/perl -w use strict; my @input = (1..6); my $calls; my $fib_calls; my @output = sort {$calls++; $a <=> $b} map {fib($_)} @input; sub fib { my $val = shift; $fib_calls++; $val == 0 ? return $val: $val == 1 ? return $val : return fib($val - 1) + fib($val - 2); } print "called $calls times - fib $fib_calls\n"; use Data::Dumper; print Dumper(\@output);
    Output is

    cd /home/le6303/src/perl/ /usr/bin/perl -w "/home/le6303/src/perl/" called 9 times - fib 58 $VAR1 = [ 1, 1, 2, 3, 5, 8 ];

    We only made 58 fib calls this time, but the output is wrong - it is the sorted fibonacci sequence, not the original list. How can we fix this ?

    Map to the rescue. What we do is have the first map return a list of tuple's - in other word, pairs of values. The tuples are made up of the original value and the value with some_func (fib) applied e.g.

    my @output = map {[$_, fib($_)]} @input;
    Output of this is
    cd /home/le6303/src/perl/ /usr/bin/perl -w "/home/le6303/src/perl/" $VAR1 = [ [ 1, 1 ], [ 2, 1 ], [ 3, 2 ], [ 4, 3 ], [ 5, 5 ], [ 6, 8 ] ];

    See how we have the pairs (value, some_func(value)) ?

    So what we want now is for sort to use the second value in the pair to compare, but use the first value in generating the output list. Well sort wont do that, but we can use map again - it takes a list, does a manipulation to each element and returns a list of those manipulations. Code for that is

    my @output = map {$_->[0]} # extract just the original # value from the tuple sort {$a->[1] <=> $b->[1]} # sort based on the # computed value map {[$_, some_func($_)]} # create (value, computed) # tuples @input;

    Output is as required


    use brain;

Re: Things I have learnt
by kappa (Chaplain) on Oct 22, 2004 at 11:06 UTC
    In short: read books. Those, mostly with funny animals :)

    This was my way and I'm really happy with it. First, self-teaching (kinda falling in love) then good books (kinda getting to know closer and closer and eventually marriage).

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://401357]
Approved by davido
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (5)
As of 2016-02-07 05:06 GMT
Find Nodes?
    Voting Booth?

    How many photographs, souvenirs, artworks, trophies or other decorative objects are displayed in your home?

    Results (247 votes), past polls