http://www.perlmonks.org?node_id=213206


in reply to Efficient run determination.

Okay. After some interesting discussion with PhiRatE, and utilsing his benchmark, I conclude that barring an Inline::C approach for various reasons but mostly because I want the solution to run on machines that do not have a C compiler set up, I declare Tah Tah Tah Tah. PhiRatE the winner anyway with his pure Perl solution.

Much to my surprise, his chop backwards and count approach seems faster than any another (in Perl). PhiRatE++ and thanks.

With some adjustments to the regex to prevent backtracking and the need for runtime interpolation, my big regex approach doesn't fair so badly as before. I should point out that mine is not a full solution in that it doesn't return the positions/length from the subroutine. These would be derived from @-, in situ. This biases the test in my favour somewhat. However, utilising the values that are in @- rather than copying them elsewhere would be a valid strategy in use. However, even with this advantage, it's still well short of PhiRatE's perl Solution.

My thanks to everyone that took a crack.

The results are

C:\test>213121.pl Iterations: 500 Length: 500 Enlil_2: 0.02351 sec/iteration found:208 Dingus_1: 0.01400 sec/iteration found:208 PhiRatE_2: 0.00987 sec/iteration found:208 Rasta_1: 0.05989 sec/iteration found:208 TommyW_1: 0.01516 sec/iteration found:208 Robartes_1: 0.01560 sec/iteration found:208 BrowserUk: 0.01186 sec/iteration found:208 Dingus_2: 0.01785 sec/iteration found:208 C:\test>

It doesn't include the C version as I'm not set up for Inline::C.

The benchmark code follows:

#! perl -slw use strict; use Time::HiRes qw( gettimeofday tv_interval ); my $re = ''; $re .="(?:(?:(.)(?:\\$_*))|\$)" for 1 .. 500; $re = qr/$re/o; my $stn = "aaaaaaammm38fdkkkkkkkk3,,,,,,,,,,sad909999999994lkllllll +llllllz" . ",,,,,,,,,dd888888882jk2kkd8d888d8djkjkjkjkkk3kk4k5kkkk65 + "; for (1..2) { $stn.=$stn; } my $iterations = 500; print 'Iterations: ', $iterations; print 'Length: ', length($stn); for my $sub ( qw/Enlil_2 Dingus_1 PhiRatE_2 Rasta_1 TommyW_1 Robartes_ +1 BrowserUk Dingus_2/ ){ #! p_process_2/ ) { my $res; my $t0 = [gettimeofday]; for (1..$iterations) { no strict 'refs'; $res = &{$sub}($stn); } printf "%20s: %7.5f sec/iteration found:%3d\n", $sub, tv_interval +( $t0 )/$iterations, scalar @$res; } sub BrowserUk { $_ = shift; my @c = m/$re/; $#c = $#- -1; return \@c; } sub Enlil_2 { $_ = shift; my @bah; while ( /((.)\2*)/g) { push (@bah, [$2,$-[1],$+[1] - $-[1]]); } return \@bah; } sub Dingus_1 { my $string = shift; my ($p, $c, $i, @res) = (0, substr($string,0,1)); for ($i=1; $i<length($string); $i++) { next if ($c eq substr($string,$i,1)); push (@res, [$c,$p,($i-$p)]); $c = substr($string,$i,1); $p = $i; } push (@res, [$c,$p,($i-$p)]); return \@res; } sub Dingus_2 { $_ = shift; my (@res, $i, $p); $i = 0; while ( /(.)\1*/g ) { push (@res, [$1, $i, pos()-$i]); $i = pos; } return \@res; } sub Rasta_1 { $_ = shift; my ($pp, $l, @res, $c) = (0, length); while ($pp < $l) { $c = substr $_, $pp, 1; if ( /\G\Q$c\E+/gc ) { push @res,[$c, $pp, pos() - $pp]; $pp = pos; } } return \@res; } sub TommyW_1 { $_ = shift; my ($pos, @triples) = (0); my @reps= /((.)\2*)/g; while (@reps) { my $hits=shift @reps; my $char=shift @reps; push @triples, [$char, $pos, length $hits]; $pos+=length $hits; } return \@triples; } sub Robartes_1 { my $string = shift; my ($currstart, $index, @res) = (0, 0); my @listedstring = split//, $string; my $prev = shift @listedstring; for (@listedstring) { if ($_ eq $prev) { $index++; } else { push @res, [$prev, $currstart, $index-$currstart+1]; $currstart=++$index; $prev=$_; } } push @res, [$prev, $currstart, $index-$currstart+1]; return \@res; } sub PhiRatE_2 { $_ = shift; my ($count, $i, $prev, $next, @res) = (0, 0); $prev = $next = chop($_); while ($next || $prev) { if ($prev eq $next) { $count++; } else { push @res,[$prev, $i=$count, $count]; $prev = $next; $count = 1; } $i++; $next = chop; } return \@res; } __END__ use Inline C => <<'END_OF_C_CODE'; void p_process_2(SV *svv) { char prev = 0; long count = 0; long i=0; long len = SvCUR(svv); AV *array; AV *list; char *s = SvPV(svv, len); Inline_Stack_Vars; Inline_Stack_Reset; list = newAV(); av_extend(list, 500); prev = *s; for (i=0; i<len; i++) { if (prev == *s) { count++; } else { array = newAV(); av_push(array,newSVpvn(&prev,1)); av_push(array,newSViv(i-count)); av_push(array,newSViv(count)); av_push(list,newRV_inc(array)); prev = *s; count=1; } s++; } Inline_Stack_Push(newRV_inc(list)); Inline_Stack_Done; } END_OF_C_CODE

Okay you lot, get your wings on the left, halos on the right. It's one size fits all, and "No!", you can't have a different color.
Pick up your cloud down the end and "Yes" if you get allocated a grey one they are a bit damp under foot, but someone has to get them.
Get used to the wings fast cos its an 8 hour day...unless the Govenor calls for a cyclone or hurricane, in which case 16 hour shifts are mandatory.
Just be grateful that you arrived just as the tornado season finished. Them buggers are real work.

Replies are listed 'Best First'.
Re^2: Efficient run determination.
by Enlil (Parson) on Mar 08, 2003 at 08:04 UTC
    I was looking over the results and I mistakenly put that my first solution was slower than my second one (did I mention that it was late).

    Anyhow, I went ahead and used the same benchmarking methods as used by BrowserUK above to see the results for myself except that I wanted to see how different solutions would do under different conditions. So I modified the code a little (both to do a more thorough benchmark (on the Perl only solutions, as the Inline C version is tons faster so no need to beat another dead horse),and also to add my first attempt (Enlil_1)):

    One of the first things that I noticed, is that PhiRate's Perl code is broken, as the while($next||$prev) will return false when two 0's show up in a row (i.e. "00"). Regardless, running this code on both a RH Linux 7.2 running Perl 5.8 (compiled from source) or AS Perl on WinXP Pro Enlil_1 and Dingus_2 were usually neck in neck (when PhiRate's solution returns correctly it is usually close as well though almost always last most of my results slower).

    Even when I play around with the $max_size, and the multiplier (the "x int(rand(9))" part), the rankings seem to stay the same.

    Here are some results (from both places though I have removed the string as it is sometimes rather long.):

    -enlil