comment on

I think the alternation in the s/^\s+|\s+$// version causes significant time costs in large applications. I often work with tab-delimited files containing hundreds of thousands of lines. If I'm tab-splitting these lines and then trimming each one, I'm going to pick the double-regex approach each time. I wrote some quick code that benchmarked the double-regex vs single-regex approach against three strings.

use Benchmark;

my @words = ('trim_unneeded','  front trim only','rear trim only   ','
+  both side trim ');

for my $word (@words){
    
    print "Benchmarking $word...\n\n";
    
    timethese(1_000_000, {double => sub{ $word =~ s/^\s+//; $word =~ s
+/\s+$//; },
                    single => sub{ $word =~ s/^\s+|\s+$//; }})
}
[download]

The code was run on a Celeron D 2.8 GHz machine running XP with the following results:

'trim_unneeded'
Single Regex: 0.45 seconds
Double Regex: 2.27 seconds

' front trim only'
Single Regex: 0.67 seconds
Double Regex: 2.66 seconds

'rear trim only '
Single Regex: 0.67 seconds
Double Regex: 2.45 seconds

' both side trim '
Single Regex: 0.66 seconds
Double Regex: 2.44 seconds

That's after only 1,000,000 trims. In a 800,000 line file with 50 columns per line, we're talking about 40,000,000 trims. Assuming a linear scale, that means I give up about a minute of processing time per file per run. That's far less than the time it would have taken me to type two regexes. Admittedly, it's a small optimization, and only valid for those who are processing files on the scale that I do, but for most people who end up typing the 'trim regex' often enough to complain about it on perlmonks, it probably applies.

In reply to Re^2: what is the function should I need to use for trim white spaces ? by rational_icthus
in thread what is the function should I need to use for trim white spaces ? by jesuashok

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Just another Perl shrine
	PerlMonks