I have two very long (>64k) strings of equal lengths - $s1 and $s2. They are strings of bytes, meaning that any value from chr(0) to chr(255) is legal. $s2, however, will not have any chr(0). $s1 may or may not have any. What I need to do is look at each byte in $s1 and if it is chr(0), replace it with the corresponding byte in $s2. So, something like the following code:
sub foo {
my ($s1, $s2) = @_;
my @s1 = split //, $s1;
my @s2 = split //, $s2;
foreach my $idx ( 0 .. $#s1 ) {
if ( $s1[$idx] eq chr(0) ) {
$s1[$idx] = $s2[$idx];
}
}
return join '', @s1;
}
foo() could return the resulting string or it could modify $s1 in place. If foo() returns $s1, I'm going to be doing
$s1 = foo( $s1, $s2 ); in all cases.
Here's what I've got so far, including Benchmark harness. Whoever comes up with the fastest version earns a meter of beer from me whenever we see each other.
#!/usr/bin/perl
use 5.6.0;
use strict;
use warnings FATAL => 'all';
use Benchmark qw( cmpthese );
my $s1 = join '', (do_rand(1) x 100_000);
my $s2 = join '', (do_rand(0) x 100_000);
cmpthese( -2, {
'split1' => sub { my $s3 = split1( $s1, $s2 ) },
'substr1' => sub { my $s3 = substr1( $s1, $s2 ) },
});
sub split1 {
my ($s1, $s2) = @_;
my @s1 = split //, $s1;
my @s2 = split //, $s2;
foreach my $idx ( 0 .. $#s1 ) {
if ( $s1[$idx] eq chr(0) ) {
$s1[$idx] = $s2[$idx];
}
}
return join '', @s1;
}
sub substr1 {
my ($s1, $s2) = @_;
for my $idx ( 0 .. length($s1) ) {
if ( substr($s1,$idx,1) eq chr(0) ) {
substr($s1, $idx, 1) = substr($s2, $idx, 1);
}
}
return $s1;
}
# This makes sure that $s1 has chr(0)'s in it and $s2 does not.
sub do_rand {
my $n = (shift) ? int(rand(255)) : int(rand(254)) + 1;
return chr( $n );
}
__END__
Update: It looks like there is a 2-way tie between
avar and
moritz. I went ahead and wrote an in-place version of
moritz's code. Thanks to
SuicideJunkie for fixing my stupidity in the test data. The script now looks like:
#!/usr/bin/perl
use 5.6.0;
use strict;
use warnings FATAL => 'all';
#use Test::More no_plan => 1;
use Benchmark qw( cmpthese );
my $s1 = do_rand(0, 100_000);
my $s2 = do_rand(1, 100_000);
my $expected = split1( \$s1, \$s2 );
cmpthese( -3, {
'avar2' => sub {
my $s3 = $s1; avar2( \$s3, \$s2 );
# is( $s3, $expected, "avar2" );
},
'moritz' => sub {
my $s3 = $s1; moritz( \$s3, \$s2 );
# is( $s3, $expected, "moritz" );
},
});
sub split1 {
my ($s1, $s2) = @_;
my @s1 = split //, $$s1;
my @s2 = split //, $$s2;
foreach my $idx ( 0 .. $#s1 ) {
if ( $s1[$idx] eq chr(0) ) {
$s1[$idx] = $s2[$idx];
}
}
$$s1 = join '', @s1;
}
sub avar2 {
my ($s1, $s2) = @_;
use bytes;
$$s1 =~ s/\0/substr $$s2, pos($$s1), 1/eg;
}
sub moritz {
my ($s1, $s2) = @_;
my $pos = 0;
while ( 0 < ( $pos = index $$s1, "\000", $pos ) ) {
substr( $$s1, $pos, 1 ) = substr( $$s2, $pos, 1 );
}
}
sub do_rand {
my ($min, $len) = @_;
my $n = "";
for (1 .. $len) {
$n .= chr( rand(255-$min)+$min )
}
return $n;
}
__END__
I'm going to keep it open until 24 hours have passed from the initial posting of this node. If no-one gets any faster, both
moritz and
avar have a meter of beer from me.
My criteria for good software:
- Does it work?
- Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.