Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

When do you function?

by zdog (Priest)
on Dec 27, 2000 at 08:38 UTC ( [id://48381]=perlmeditation: print w/replies, xml ) Need Help??

Today, I tried to write another Perl program (first one in a while tho). I spent quite a bit of time on it. But anyway, that doesn't matter, what matters (at least for this post) is what I began to debate about afterwards: When do I and when should I put my code into functions?

After thinking over it a while, I came up with several reasons why people (or maybe just me) should use functions at all:

  1. In order to decrease repetition in code by calling the same function several times instead of cutting and pasting.
  2. To clean up code to make it easier to follow and allow the reader, programmer, and debugger to find mistakes and other things they would want to alter more easily.
  3. To be able to more efficiently and quickly alter the variables used in pieces of code.

Each of these reasons has some validity and could very well be considered for use in each of those situations, however, each also has some disadvantages which I also pondered about:
  1. Loops, loops, loops. Usually in such a situation a loop will make more sense than using a function, but do not discount functions for this purpose.
  2. In order to make code easier to follow, use comments. That's what they are for, aren't they? Also, jumping from function to function can get confusing sometimes.
  3. This can be done by doing a for() loop whether it is cycling through a set of numbers ( for(my $i=0;$i<10;$i++) { ... } ) or going through an array with many random variables ( for(@array_with_many_random_variables) { ... } ).

With all of this in mind, I find that functions can help in many ways in programming in general, however, they should only be used when they are best option. In conclusion, before one puts code into a function, they should consider loops, comments, and all other possible means of achieving their goal and weigh the advantages and disadvantages of each and not making functions just to make functions. (At this point you say, "Duh, zdog! Isn't that what you should do with everything in life? Some major breakthrough... Geeze, and I just wasted a whole 1 minute and 34 seconds* reading this.")

This is what I think on the topic. What are your thoughts about it? When do you use functions?

Zenon Zabinski | zdog | Zenon.Zabinski03@students.bcp.org

*Disclaimer: The actual time it takes to read this post may vary.

Replies are listed 'Best First'.
Re (tilly) 1: When do you function?
by tilly (Archbishop) on Dec 27, 2000 at 09:16 UTC
    Hopefully most of the time. :-)

    Seriously I find that my average function is about 10 lines. Some are shorter - a lot shorter. A few are much longer. But that seems to be an average for me.

    Here is a list of reasons from chapter 5 of Code Complete to think about. I won't copy explanations, just the reasons:

    1. Reducing complexity
    2. Avoiding duplicate code
    3. Limiting effects of changes
    4. Hiding sequences
    5. Improving performance (optimize later)
    6. Making central points of control
    7. Hiding data structures
    8. Hiding global data
    9. Hiding pointer operations
    10. Promoting code reuse
    11. Planning for a family of programs
    12. Making a section of code reusable
    13. Improving portability
    14. Isolating complex operations
    15. Isolating use of non-standard language functions
    16. Simplifying complicated boolean tests
    I have found that all of these benefits still hold in Perl. Well performance is usually hit a little, but you are left in a position to optimize where it counts later. And you shouldn't have non-standard language functions. But in practice I have noticed portability issues from time to time.

    So while that list doesn't hold perfectly for Perl, it is still generally on target.

    Note that in particular comments explaining what you intended at one point are not a good substitute for clear code. Should you change the code later, the comments will often remain to confuse you. Also deeply nested loops may not take many lines, but they make it much harder to separate the forest from the trees.

      when I was mid-BS (ahem) in CS, I used to write all my programs "top down"; just create a bunch of function/procedure names that'd handle the problem and pencil in the loops (on the backs of output pages, w/ a couple of sheets for 'global' vars and page for function prototypes, details to be filled in later). As a mid-level programmer now, performance is so rarely a concern (what I write runs less often (e.g. daily) than its worth to squeeze (or 'bum') any extra speed out of it0) that the value of a clean loop calling meaingfully named functions outweighs any loss in the internal context switches etc.

      Keeping the flow clean and localizing the gritty details makes life much easier, and makes the final program that much more maintainable: its way easier to rewrite &Get_Image_Path to handle the addition of a separate image server box than to go back and find and handle all the spots that were calling "Get_Image_Path($case_number, $document_number)"

      a

      As in a's comment, I also tend to write top down using wishful programming. (Where wishful programming is using names of functions I haven't written yet.) This has the benefit of not having to think about everything at once, and if you create stub functions it will actually run and you can slowly build it up. Ah ha, starting to use some XP techniques! (Except I learnt to do this before XP was around...)

      When it comes to breaking functions up, if I have a function which is trying to do several different things, it is usually time to break it up. Likewise, a function which is more than a couple of screens long is about due for a break up. Although the screens one is very relative... Especially when I'm writing CGI scripts which are generating forms, they can get rather lengthy.

      Over the last couple of weeks I've been writing a interface for managing Spong - using Perl OO. In an object, if I realise that I need to use some code again (for example: sucking details out of the database about a host), it get's broken out to a new function in the object.

      So, there's some of my approaches. Of course, if I sat down and did some decent design before hand I'd only need to use the first one - wishful programming - as the functions would already be broken up logically.

      Sigh...

      Updated: Right, that link to Spong now works as it should. My bad. (Thanks a for pointing it out!)

        Oh, you are so on the nail on this... I've gotten into a habit of doing this instead of writting pseudo-code:
        # authenticate user AuthenticateUser($foo,$bar) or die('Could not authenticate user'); # a while later... # do some stuff SomeStuff($more,@params) or die('Could not do more stuff cause reason'); # subs from here down ######################## sub AuthenticateUser { return(1); } sub SomeStuff { return(1); }
        I tend to code this way especially when I'm discussing the program with colleages. Then we all get a copy and each one goes out to fill in his/her respective blanks. In the end, I believe the result is usually very clear and quite maintainable!

        #!/home/bbq/bin/perl
        # Trust no1!
Re: When do you function?
by Albannach (Monsignor) on Dec 27, 2000 at 09:50 UTC
    Just some random comments here... my basic principle (and this is just a layman's opinion ;-) is to isolate any repeated sequence of operations by placing them in a function/subroutine with a descriptive name. This can certainly be carried too far, so in practice I make functions of things that are becoming irritating to type or cut/paste from elsewhere.

    I certainly agree with your first three points, and I'd like to add that the reduction of repetition also reduces coding errors. Even (especially?) when I cut and paste, I can introduce variations or subtle errors (often involving scoping) which are entirely avoided when I take the time to make a subroutine.

    I find that building a sub forces me to think more fully about what exactly I'm trying to express as I try to make it as much a black-box as possible. Subs also give me ample room to add better error-checking and handling that may (gasp!) get left out if I were to strip down the operation and leave it in-line. And after all this extra effort, I get something that I can re-use elsewhere more easily than some sequence of lines from the middle of a big loop.

    On your second set of points:

    1 - On looping, I just don't like deep indentation, and especially if chunks of the loops are nicely isolated, I will put them in subs just to unclutter the structure, which leads me to commenting:

    2- I tend to think the fewer comments the better, and that's not to make things harder for others. I mean that whenever I find myself making any comment at all (apart from header blocks which should be quite detailed) I ask myself just what is so confusing here, why isn't the code obvious, and can I make it obvious and avoid the comment altogether? Jumping from sub to sub shouldn't be confusing if they each do something that makes sense on it's own. For a trivial example, in $a = sin($b) * cos($c) the functions each have clear and obvious purposes of their own, and the thought of calculating them in-line would be a great starter for the "fattest obfuscation" category...

    Finally there is the consideration of performance especially if you are passing a lot of data to a function (in which case you should probably pass a reference anyways but there are always issues...). I'd like to think that the compiler (speaking generally here) should optimize what I write and not really care if it's a subroutine or in-line, but again in practice this isn't the case (yet anyway) so it may well be that using a sub call can slow down an operation that I will perform millions of times to the point that I shouldn't make the call. When I wrote a lot of C I enjoyed making elaborate preprocessor definitions to get the best of both worlds, and to some extent I miss that in Perl.

    I look forward to the more professional opinions of the learned monks on this question!

    --
    I'd like to be able to assign to an luser

      I just have one comment on... comments:

      I aggree that comments on code are often be redundant, and so should be used with caution.

      But comments on data should be all over the place, especially with Perl complex types (yes, I mean hashes!), a simple comment like

      my %nodes; # node_id => ref_to_node

      can go a _long_ way to help you or anybody else who has to maintain the code 3 month later.

        In my experience 90% of comments are unnecessary. But, on the other hand, 90% of the comments that should exist, are missing.

        Yes, code should be 'self-documenting', but comments should explain why you're doing what you're doing. It's no use being able to understand your regex or complex data structures etc if I can't work out why on earth you're doing things this way in the first place.

        Of course, to bring this back to the initial point, if you break everything down into simple, short, self-contained functions that do only one thing, and are well named, then it's going to be quite evident what they do, and POD will mostly be sufficient :)

        Tony

Re: When do you function?
by Falkkin (Chaplain) on Dec 27, 2000 at 10:42 UTC
    My rule of thumb: if a function doesn't make code any easier to read, it's not worth writing.

    At one point in my life (not very long ago ;)), I had the temptation to create lots of little 2-line and 3-line functions (probably coming from an OO background in school, where I was taught to make even simple variable accesses into object methods...)

    But, consider the following code:

    # test1.pl (uses no user-defined functions): for($i = 0; $i < 100000; $i++) { print $i; } # test2.pl (uses a simple function): for($i = 0; $i < 100000; $i++) { print func($i); } sub func { return $_[0]; } # OUTPUT [falkkin@shadow ~/perl] time perl test1.pl > /dev/null real 0m4.826s user 0m4.240s sys 0m0.010s [falkkin@shadow ~/perl] time perl test2.pl > /dev/null real 0m14.227s user 0m14.100s sys 0m0.050s

    It's clear in this case that the overhead involved in calling a function (mostly involving pushing variables to perl's stack and popping them back off) makes the code run roughly 3 times more slowly. Generally, I try to avoid calls to "small" functions in inner loops whenever performance is anything of an issue; if I'm only reusing those few lines of code once or twice in a program, it's just not worth it to create a function for it, in my opinion.

      To mangle Mark Twain: "There are four kinds of lies: Lies, Damn Lies, Statistics, and Benchmarks".

      This is a good case for learning to use the Benchmark module. I found that creating an anonymous sub ref was slightly faster than a full sub, and that a bare block had less efficiency gains (although still outstanding for this simple function) the larger the sample size. If you are sensitive to milliseconds of difference, or have extremely complex algorithms you should measure them in situ to determine whether a bare block is better than a subroutine.

      To me, the main routine should be not unlike an outline. Some works, like the standard five paragraph essay, don't really need this. For reference books or novels, on the other hand, it can be an essential aid to the writer. You can always paste the text of a subroutine into other parts of the script for production code where milliseconds count. This will assist the compiler in streamlining the code (although repeated functions will make the whole executable larger and perhaps increase compile time if you are compiling for each script load). But when you are designing the code, it would make sense to abstract most of your larger blocks, just like you would abstract chapters, sections, and paragraph sets when writing.
      I'm relatively new to perl (OK, a complete newbie), but after looking at the same program written in C, I got much different results. 330ms and 350ms user time. Is there a way in perl to force inlining of functions? Or maybe this just points out a place for improvements in the compiler... (which, I understand, due to the interpreted nature of perl, is neccessarily minimal, but would this be a quick-fix job?)
        Did you test the performance of the perl programs (vs. equivalent C code) on your machine? Chances are relatively decent that your machine is better than my clunky 133 MHz Pentium.

        Fact correction: Perl is a compiled language, actually (well, as compiled as Java, anyhow). The perl program works by taking in your source file, and compiling it in several stages.

        Stage 1: the compile phase. In this phase, Perl converts your program into a data structure called a "parse tree". If the compiler sees a BEGIN block or any "use" or "no" declarations, it quickly hands those off for evaluation by the interpreter.
        Stage 2: Code generation (optional). If you're running one of the compiler backend modules (such as B::Bytecode, B::C, or B::CC), the compiler then outputs Perl bytecodes (much like a Java .class file) or a standalone chunk of (very odd-looking) C code. These code-generators are all highly experimental at the present.
        Stage 3: Parse-tree reconstruction. If you did stage 2, stage 3 remakes the parse tree out of the Perl bytecodes or C opcodes. This speeds up execution, because running the bytecodes as-is owuld be a performance hit.
        Stage 4: The execution phase. The interpreter takes the parse tree and executes it.

        This is Perl compilation in a nutshell... read Chapter 18 of the 3rd edition of Programming Perl for a more in-depth analysis. For many tasks (especially simple ones such as these) Perl will be slower than C, because C is basically a more-portable form of assembly language, and assembly language (once actually assembled) works with raw hardware, and is hence as about fast as you can get.

        Another difference between Perl and a native C app that may affect performance is the fact that Perl has its own stack (actually, it has several stacks) as opposed to a C program, which is likely to just use the system stack.

        I add my voice to this: the should be a way to inline functions and method calls.

        Granted you can use the pre-processor (perl -P) to inline functions but this does not work for method calls. This is really bad when designing OO Perl, where I find myself using straight hash access ($o->{field}) instead of accessors ($o->field) for some often-called methods (or writing painfull and risky kludges), which makes maintenance much harder

        I am actually very surprised this is not even a Perl 6 RFC, I would think that this is a simple (I would think it is quite easy to implement) way to enhance speed or maintainability of OO Perl programs.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://48381]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (4)
As of 2024-03-19 05:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found