Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW

Re: Fast Way to Split String in to Chunk of Equal Length

by davido (Cardinal)
on Nov 25, 2011 at 08:11 UTC ( #939995=note: print w/replies, xml ) Need Help??

in reply to Fast Way to Split String in to Chunk of Equal Length

Are you sure that chunking the strings is where your performance bottleneck is? Have you profiled? Could IO be a more significant constraint to execution time? I only ask because it doesn't seem like the performance of substr for 10,000,000 strings is all that bad for the problem domain.

If this section of code is really significant, here's a comparison of the valid solutions provided up to this point in the thread. Naturally unpack wins. It's only a few seconds slower in 10,000,000 iterations than the "control" case (which isn't a solution, but just a check to see what the framework for each solution costs).

use strict; use warnings; use Benchmark qw/timethese/; # Test/Benchmark parameters. $main::string = q/CTTCGAATT/; our $time = 10000000; my $subs_to_test = { substr => \&by_substr, match => \&by_match, unpack => \&by_unpack, control => \&control, }; # Benchmark. timethese( $time, $subs_to_test ); # Subs being benchmarked. sub control { my @substrings; @substrings = qw/CTT CGA ATT/; return \@substrings; } sub by_substr { my $position = 0; my @substrings; while( $position < length $main::string ) { push @substrings, substr( $main::string, $position, 3 ); $position += 3; } return \@substrings; } sub by_match { my @substrings; while( $main::string =~ m/(...)/sg ) { push @substrings, $1; } return \@substrings; } sub by_unpack { my @substrings; @substrings = unpack( '(a3)*', $main::string ); return \@substrings; }

Here's the output.

Benchmark: timing 10000000 iterations of control, match, substr, unpac +k... control: 5 wallclock secs ( 5.65 usr + 0.00 sys = 5.65 CPU) @ 17 +69911.50/s (n=10000000) match: 22 wallclock secs (21.12 usr + 0.00 sys = 21.12 CPU) @ 47 +3484.85/s (n=10000000) substr: 12 wallclock secs (11.88 usr + 0.00 sys = 11.88 CPU) @ 84 +1750.84/s (n=10000000) unpack: 8 wallclock secs ( 9.04 usr + 0.00 sys = 9.04 CPU) @ 11 +06194.69/s (n=10000000)

By the way: This question was crossposted to StackOverflow here.


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://939995]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (8)
As of 2020-04-09 10:52 GMT
Find Nodes?
    Voting Booth?
    The most amusing oxymoron is:

    Results (47 votes). Check out past polls.