Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Re: Optimizing existing Perl code (in practise)

by feloniousMonk (Pilgrim)
on Aug 19, 2002 at 18:02 UTC ( #191258=note: print w/replies, xml ) Need Help??

in reply to Optimizing existing Perl code (in practise)

I definitely think benchmarking is the key answer here.

I think no matter what, this is an implementation specific problem. I always wrote Perl for
programmer speed, and paid less attention to execution speed. Until I started working on problems
that were big enough to deal with datasets ranging from hundreds of meg to a few gig in size.

I love Perl but for data this big, and the bit of processing required, I would have initially went
with either C or C++. BUT - I work in a place where most everyone knows Perl and not many know C/C++
so Perl optimization has become a big issue.

I've learned a lot about how slight code changes can increase efficiency, especially when
certain tasks need to be done many times over. I've seen major speed increases
just by benchmarking and trying a different solution, but keeping the same algorithm.
Things especially like
my @a = (); if ( $foo =~ /^(\d+)\s+(\w+)\s*$/ ) { @a = ($1, $2); }
my @a = split (/\s+/, $foo);

Guess what? In my system, option #1 runs about 90% faster.

-felonious --

Replies are listed 'Best First'.
Re: Re: Optimizing existing Perl code (in practise)
by Anonymous Monk on Aug 19, 2002 at 18:48 UTC
    Those two code snippets are not at all similar in function, so benchmarking them is useless.
      Um, they do perform the same function. They both place 2 variables
      into an array....

      Yes, the method is different but what I intended to illustrate is that for a given set of data,
      2 different methods of processing may have significant performance differences
      while giving the same results.

      Also implicit in the code is that the solution will not work everywhere, which is why optimization depends on what
      you intend on optimizing.

      -felonious --

        No they don't. For starters, your split produces and assigns at least three values in every case the pattern matches. The difference in their effects may be irrelevant to your specific application, but that doesn't make them equivalent. Taking that into consideration from the start, you shouldn't have needed to benchmark them to predict the outcome.

        If you want a regex version that works meaningfully similar to the split, it would have to look something like this: my @a = ($foo =~ /(?:\s+)?(.*?)(?=\s)/g);
        (Because your pattern is as simple as \s+, you can formulate a regex version like my @a = ($foo =~ /(\S+)/g);
        but that doesn't generalize to splitting at foo(?:bar|baz)? )

        Makeshifts last the longest.

        No the do not perform the same function. Your re method functions as a gaurd clause allowing for a) early failure and b) avoiding assignment on failure. The split version performs the assignment even if the strings do not match the pattern. If your data is always going to pass the re, then the split version would be the faster version (and even better than your split version would be  split " ", $foo). Care to show your benchmark where the re version was 90% faster?

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://191258]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (10)
As of 2019-05-24 17:55 GMT
Find Nodes?
    Voting Booth?
    Do you enjoy 3D movies?

    Results (151 votes). Check out past polls.

    • (Sep 10, 2018 at 22:53 UTC) Welcome new users!