http://www.perlmonks.org?node_id=744750

balakrishnan has asked for the wisdom of the Perl Monks concerning the following question:

UPDATE : i used Devel::NYTProf and it helped me lot for fine tuning.
Hi Monks,
I have a perl code which could do the text processing.
It gets a list of records as input,
mycompany----engineer=====itdept-----33
And finally, it gives the expected output, but the time taken for processing is not ok for me.
use Time::HiRes qw(time); sub Extract { my ($Line,$Name,$Pattern) = @_; my (@Arr,$CallId); print "Extract begins with the input line $Line\n"; print time,"\n"; @Arr = ($Line =~ m/$Pattern/g); print time,"\n"; $Age = $Arr[$#Arr]; print time,"\n"; $Details{$Age}{$Name} = \@Arr if defined($Age); print time,"\n"; } $str = "mycompany----engineer=====itdept-----33"; $Name = "MyName"; $Pattern = "([a-z]+)\-+([a-z]+)\=+([a-z]+)\-+([0-9]+)"; Extract($str,$Name,$Pattern);
I printed the time after the every line of processing inside the function Extract().

Help me to fine tune this code.

Replies are listed 'Best First'.
Re: Performance issue
by moritz (Cardinal) on Feb 18, 2009 at 13:57 UTC
    If you want to know which parts of your code consume the time, remove your print time lines and instead use a profiler like Devel::NYTProf.

    Then you can start optimizing.

Re: Performance issue
by ELISHEVA (Prior) on Feb 18, 2009 at 14:08 UTC
    It might help to post your timing results along with the code. Systems differ and it is possible that the timing results you see are specific to your system rather than something internal to the Perl interpreter.

    On my system, the time it takes to parse each line is essentially 0 - so I'm not sure what you are concerned about.

    1234974983.91134 1234974983.91134 1234974983.91134 1234974983.92134

    If you aren't getting the same results, I suspect the problem is your print lines. They take a *huge* amount of time relative to the regex parser. If I run Extract 1000 times and compare the time before the first run and after the 1000th run, I get 0.22 seconds with only the line: "Extract begins with..." printing out and 0.03 seconds if I remove all prints from within the loop

    The bottom line: don't use print statements to profile code - you'll be measuring the print statements, not the code performance. A better option is to use the -d:DProf option, e.g. perl -d:DProf myscript.pl - see Profiling Perl and module documentation Devel::DProf.

    Best, beth

Re: Performance issue
by graff (Chancellor) on Feb 18, 2009 at 14:49 UTC
    Apart from the timing issue (where I think previous replies have been very helpful), I'd point out that your "Extract" sub really does not make sense as a subroutine. It receives and declares variables that are never used, and uses/alters variables that must be global in scope (because they're never declared) -- bad form in both cases.

    You might want to get acquainted with Benchmark, and then maybe use that to compare alternate approaches. To use it effectively, you'll want to rethink your subroutine so that it's more "atomic" (independent, modular), which will be a Good Thing.

    You might also consider split as an alternative to regex matching with captures. Based on the data shown, something like this would suffice:

    $str = "mycompany----engineer=====itdept-----33"; my @pieces = split /\W+/, $str;
    I don't recall off-hand what sort of timing difference this might yield (but Benchmark would be a good way to find out).

    Maybe the stuff that your "Extract" sub is doing could be done without a sub call, and that might save some time (along with getting rid of the unnecessary print statements).

Re: Performance issue
by ww (Archbishop) on Feb 18, 2009 at 14:01 UTC

    Performance issue? .0002 seconds per sub call? is your input file multi-GB? (assumes $str is coming from a file or DB, which might suggest "performance issue" lies elsewhere).

    After adding strict and warnings and dealing with missing variable declarations, on a not-particularly-speedy w32 machine (results similar on Ubuntu):

    Extract begins with the input line mycompany----engineer=====itdept--- +--33 1234965298.64138 1234965298.64149 1234965298.64153 1234965298.64158

    Update Further thoughts:

    #!/usr/bin/perl use strict; use warnings; # use Data::Dumper; use Time::HiRes qw(time); my $Name = "MyName"; my ($Age, %Details); my @source = ('acompany----engineer=====itdept-----33', 'bcompany----butcher=====killingfloor-----34', 'acompany----baker=====oven-----35', 'acompany----candlestickmaker=====tallow-----36', 'acompany----monk=====monastery-----37', ); my $Pattern = "([a-z]+)\-+([a-z]+)\=+([a-z]+)\-+([0-9]+)"; my $start_time = time; for my $str(@source) { Extract($str, $Name, $Pattern); } my $end_time = time; print "back in main elapsed time for a 5-element \@source was: ", $end +_time - $start_time, "\n"; sub Extract { my @subarr = @_; my ($Line,$Name,$Pattern) = @subarr; my (@Arr,$CallId); # print "Extract begins with the input line $Line\n"; # print time,"\n"; @Arr = ($Line =~ m/$Pattern/g); # print time,"\n"; $Age = $Arr[$#Arr]; # print time,"\n"; $Details{$Age}{$Name} = \@Arr if defined($Age); #print time,"\n"; }

    Timings: first, with print at line 38 enabled:

    C:\ww>perl 744750.pl Extract begins with the input line acompany----engineer=====itdept---- +-33 Extract begins with the input line bcompany----butcher=====killingfloo +r-----34 Extract begins with the input line acompany----baker=====oven-----35 Extract begins with the input line acompany----candlestickmaker=====ta +llow-----36 Extract begins with the input line acompany----monk=====monastery----- +37 back in main elapsed time for a 5-element @source was: 0.0008189678192 +13867

    whereas, with line 38 commented out (ie, as shown):

    C:\ww>perl 744750.pl back in main elapsed time for a 5-element @source was: 0.0001859664916 +99219

    The difference, 0.000633001327514648 (roughly 3x the total elapsed time_when_omitting_the_prints in sub Extract) is attributable to the mis-use of print for profiling. See moritz's above & ELISHEVA's below.

Re: Performance issue
by targetsmart (Curate) on Feb 18, 2009 at 15:31 UTC
    The problem may not be with this routine 'Extract', profile your entire script under Devel::NYTProf, here is the webcast on how to use it, http://www.vimeo.com/1441518

    Vivek
    -- In accordance with the prarabdha of each, the One whose function it is to ordain makes each to act. What will not happen will never happen, whatever effort one may put forth. And what will happen will not fail to happen, however much one may seek to prevent it. This is certain. The part of wisdom therefore is to stay quiet.
Re: Performance issue
by slacker (Friar) on Feb 18, 2009 at 13:57 UTC
    Just a couple questions...
    How long does processing currently take?
    What is an acceptable amount of time?