http://www.perlmonks.org?node_id=423901

prad_intel has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks ,

Well I am a bit puzzled about the subroutine calls eating the processor.

All i want to check is about my code which uses a particlar set of lines which gets reapeated in 3 different places.

As my requirement is something like it takes 10-15 mins for the entire program to finish the task ,If i put those lines which get repeated into a subroutine take more time or leave it as it is.

Btw I want some info about my previous doubt which involves entering formulas into a csv file through perl without using modules.

Again I have a doubt wether using more modules will add to the processor hammering.

Thanks and Regards Pradeep.S

Replies are listed 'Best First'.
Re: Subroutine speed
by blazar (Canon) on Jan 21, 2005 at 09:12 UTC
    Well I am a bit puzzled about the subroutine calls eating the processor.
    Well, it's known that sub calls are somewhat expensive in perl. But does it really matter? In most cases I find it doesn't. If you're really concerned about this you could try benchmarking (Benchmark).

    Update: I've tried to link to Benchmark.pm's online documentation by means of [doc://Benchmark], but it links to http://www.perldoc.com/perl5.8.0/pod/func/Benchmark.html instead, which is not correct, of course. So I've inserted the link manually, but I'd like to know if I'm just being dense or if it should work as described in 43037.

    All i want to check is about my code which uses a particlar set of lines which gets reapeated in 3 different places.
    In some cases an alternative to a sub is a loop, but you can't really do that if the three different places are too... ehm... different.

    Update: I forgot mentioning that prototyped subs which accept no arguments introduce only a negligible overhead, which is why they're used for constant emulation. I want to stress, however, that this is not meant to be an invitation to share global variables instead of parameter passing, especially for the OP.

    As my requirement is something like it takes 10-15 mins for the entire program to finish the task ,If i put those lines which get repeated into a subroutine take more time or leave it as it is.
    I doubt that this is a well-formed English sentence, but I'm not a native English speaker either...

    If I understand you correctly, however, why don't you try yourself, perhaps as suggested above?

    Btw I want some info about my previous doubt which involves entering formulas into a csv file through perl without using modules.
    As a matter of style I'd post such a different question in a different article. FWIW it doesn't make much sense to me: we should first agree on just what a "formula" is in a csv file.

    As a side note evidence is that whenever someone ask about doing something "without modules" he's asking the wrong question for he's typically convinced that e.g. he can't install them locally, which often is not really the case.

    Said this, csv files should be more or less manageable with "pure perl". We still have to understand that "formula thing".

    Again I have a doubt wether using more modules will add to the processor hammering.
    Oh, so it's this that you're all concerned about!! Well: ditto as above i.e. why don't you try for yourself?

    In any case as a general rule I'd answer negatively: chances are that you'll speed the whole thing up, especially if we're talking about XS modules - it's one of the reasons they're created in the first place!

Re: Subroutine speed
by BrowserUk (Patriarch) on Jan 21, 2005 at 09:08 UTC

    You are asking questions, but have posted no code for us to examine. That makes it nearly impossible for anyone to try and help you.

    Post a minimal example of your code that demonstrates your problem, along with a clear description of your concerns (+any error messages), and you will receive a much greater response.

    As asked, this question, and your previous two do not provide enough information for us to begin to address them.


    Examine what is said, not who speaks.
    Silence betokens consent.
    Love the truth but pardon error.
Re: Subroutine speed
by BrowserUk (Patriarch) on Jan 21, 2005 at 12:12 UTC
    Well I am a bit puzzled about the subroutine calls eating the processor.

    You are recursively traversing a subtree, opening all the files and generating MD5 checksums. This will consume a lot of processor time as the math involved in calculating MD5s is cpu intensive. The cost of a subroutine call is miniscule by comparison and is a complete red-herring.

    You say it is taking 10-15 minutes as if that is too long. How many files, and how big are they? It doesn't sound unreasonable to me.

    Other than that, it is not clear to me exactly what problem you are asking for help with. I have your code, but I obviously cannot run it without creating a subdirectory tree that contains files with the names of those you are looking for, and I could not verify your timing without having the same number and sizes of files as you have.

    The biggest problem I see with your code is that you are reading all the directory entries into an array at each level of recursion. And recursing whenever you encounter a nested directory. That means that if your directories have lots of files and/or the directory structure is very deep, you are consuming large amounts of memory as you descend the tree.

    I think that perhaps your process is consuming so much memory that it is pushing your machine into swapping?

    If you are determined to continue to use your own directory traversal routine, then you should avoid "slurping" the whole directory into an array. Instead, call readdir in a while loop and process one entry at a time. This will require that you avoid using a BAREWORD directory handle (like DIRECTORY) and use a lexical instead. Otherwise you will run into conflicts during recursion.

    If none of that previous paragraph makes sense to you, then you should probably consider using File::Find or similar instead.

    BTW. You should have use strict; (not use Strict;).


    Examine what is said, not who speaks.
    Silence betokens consent.
    Love the truth but pardon error.
Re: Subroutine speed
by Mutant (Priest) on Jan 21, 2005 at 09:45 UTC
    99.9% of the time (at least), the overhead that subroutines, objects, dynamic memory allocation, etc introduces is NOT relevent. If it was, we'd all still be programming in assembler. Processing speed is (these days) a very minor consideration in programming. Much more important is maintainability of code, which is vastly improved by minimising code duplication.
Re: Subroutine speed
by prad_intel (Monk) on Jan 21, 2005 at 09:38 UTC
    Sorry Monks ,

    I am totally new to this kind of a technical group and am getting aware of how to list out the problems.

    Thanks to all those who have identified my mistakes in defining the problem.

    the code is as follows , I hope now people can easily test it out and bring the best to me.

    </p> #!/usr/bin/perl -w use Strict; use File::Stat; use Digest::MD5; print "Enter the Path where all the data files are available:\t"; $sourcedir= <STDIN>; chomp($sourcedir); open(LOG ,">>D:\\prad\\log1.csv"); #$workbook = Spreadsheet::WriteExcel->LOG('D:\prad\log1.csv'); print LOG "Report Ver - 0.1\n"; print LOG"_______________________________\n"; print LOG "\nFilename,Md5sum(oneof),Size,Md5Sum(oneofcmd_cln),Size,Md5 +Sum(oneofgsd_cln),Size"; print LOG"\n___________________________________________\n"; list_recursively("$sourcedir\\oneof"); exit; ###################################################################### +########## # Subroutine ###################################################################### +########## # list_recursively # # list the contents of a directory, # recursively listing the contents of any subdirectories # sub list_recursively { my($directory) = @_; my @files = ( ); # Open the directory unless(opendir(DIRECTORY, $directory)) { print "Cannot open directory $directory!\n"; exit; } # Read the directory, ignoring special entries "." and ".." @files = grep (!/^\.\.?$/, readdir(DIRECTORY)); closedir(DIRECTORY); # If file, print its name # If directory, recursively print its contents # Notice that we need to prepend the directory name! foreach my $file (@files) { # If the directory entry is a regular file if (-f "$directory/$file") { $filepath="$directory"; $filepath2=$filepath; $filepath2=~s/oneof/oneofcmd_cln/; $filepath3=$filepath; $filepath3=~s/oneof/oneofgsd_cln/; #Finding the size of the file #digesting open(FILE ,"<$filepath\\$file"); binmode (FILE); $file_size="$filepath\\$file"; @st = stat($file_size) or die "No $file: $!"; $digest = Digest::MD5->new->addfile(*FILE)->hexdigest; #print file name in Log file print LOG "\n$filepath\\$file"; printf LOG",%s,%s",$digest,$st[7]; close(FILE); if(open(FILE, "<$filepath2\\$file")) { #print LOG "in MD2"; binmode (FILE); $file_size="$filepath2\\$file"; @st = stat($file_size) or die "No $file: $!"; $digest = Digest::MD5->new->addfile(*FILE)->hexdigest; printf LOG",%s,%s",$digest,$st[7]; close(FILE); } else{ print LOG",Null,";} if(open(FILE, "<$filepath3\\$file")) { #print LOG "in MD3"; binmode (FILE); $file_size="$filepath3\\$file"; @st = stat($file_size) or die "No $file: $!"; $digest = Digest::MD5->new->addfile(*FILE)->hexdigest; #print file name in Log file #print LOG "$filepath3\\$file"; printf LOG",%s,%s",$digest,$st[7]; #here i need to include a formula into the csv file at the #end of eac +h line. close(FILE); } else{ print LOG",Null";} # If the directory entry is a subdirectory }elsif( -d "$directory/$file") { # Here is the recursive call to this subroutine print LOG"\nFolder - $directory\\$file \n"; print LOG"----------------------------------\n"; list_recursively("$directory\\$file"); } } } ###################################################################### +##########

    Any improvements in the code would be a great addition. Sorry , Thanks and Regards

    Pradeep.S
      the code is as follows , I hope now people can easily test it out and bring the best to me.
      #!/usr/bin/perl -w
      It's better to
      use warnings;
      nowadays.
      use Strict; use File::Stat; use Digest::MD5; print "Enter the Path where all the data files are available:\t";
      Are you sure you want "\t" rather than "\n"?
      $sourcedir= <STDIN>; chomp($sourcedir);
      Since you're under strict this won't even compile. It is a good thing to post real code.

      Update: Case matters! (Hadn't noticed your mistake in the first place.) Please (do a favour to yourself and) reread your program taking this into account.

      open(LOG ,">>D:\\prad\\log1.csv");
      It's better to
      • use lexical FHs nowadays,
      • use the three-args form of open(). (Opinions tend to vary on this, but as a general rule, I'd still recommend it.)
      • use simple quotes (in this case) and use / as a directory separator even under Windows,
      • always check open()s for success. A minimal
        open my $fh, '>>', 'path/to/file' or die $!;
        can suffice.
      I'm not reading the rest of your script. You should consider preparing a minimal, working, test program still exhibiting the problem you're concerned about and submit that. Chances are that in the process of doing so you will find the answer yourself...

        Just out of curiosity, why do you consider use warnings; superior to perl -w?

        -- vek --