Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

Skip to the bottom first!

There are a few places in the code where small savings can be achieved essentially for free:

  • 1 while $re; runs ~10% faster than while( $re ) {}.
  • $re && last; runs a few percent faster than $re && do{ last };

But even if you could make this level of savings on every single line in the program, you'd still save maybe an hour at most.

Looking at a few of the individual REs, nothing leaps off the page as being particularly extravagant. You should heed the annotation at the top of the profiling and try to remove usage of $&. This has been known to effect a substantial time saving.

The only place affected is this sub:

sub java_clean { my $contents = $_[0]; while ($contents =~ s/(\{[^\{]*)\{([^\{\}]*)\}/ $1."\05".&wash($2)/ges) {} $contents =~ s/\05/\{\}/gs; # Remove imports ##$contents =~ s/^\s*import.*;/&wash($&)/gem; $contents =~ s/(^\s*import.*;)/&wash($1)/gem; # Remove packages ##$contents =~ s/^\s*package.*;/&wash($&)/gem; $contents =~ s/(^\s*package.*;)/&wash($1)/gem; return $contents; }

The uncommented replacements should have the same effect (untested) and the changes could have a substantial affect on the overall performance of a script dominated by regex manipulations.

While you're at it, you can also add a few micro-optimisations where they are called millions of times like:

sub wash { ##### my $towash = $_[0]; return ( "\n" x ( $_[0] =~ tr/\n// ) ); }

which will save the 7 seconds spent copying the input parameter. But given that the overall runtime is 7 minutes, that's not going to have a big effect. The only way you're a likely to get substantial savings from within the script, is to try optimising the algorithms used -- which amounts to tuning all of the individual regexes; and the heuristics they represent -- and that comes with enormous risk of breaking the logic completely and would require extensive and detailed testing.

Bottom line

All of that said, if you split the workload across two processors, you're likely to achieve close to a 50% saving. Across 4, and a 75% saving is theoretically possible. It really doesn't make much sense to spend time looking for saving within the script when, with a little restructuring, it lends itself so readily to being parallelised.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

In reply to Re: tight loop regex optimization by BrowserUk
in thread tight loop regex optimization by superawesome

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (5)
As of 2024-04-19 16:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found