Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: Re: Re: Re: Efficient run determination.

by PhiRatE (Monk)
on Nov 15, 2002 at 11:03 UTC ( #213110=note: print w/replies, xml ) Need Help??


in reply to Re: Re: Re: Efficient run determination.
in thread Efficient run determination.

As I noted, C is the language for the job. It doesn't really matter how efficient the regex solution, C can work on the data in-place without incurring much in the way of dynamic allocation costs. You have mitigated this disadvantage to an extent in your version by placing certain limitations on the algorithm, notably the 500 unit capture max. This changes the situation considerably. However the design you chose isn't competitive unfortunately :/ primarily I suspect because its not an optimised line within the re engine to be used like this, although that is conjecture sicne i'm not familiar with the internals.

Length: 1920
Enlil 2:    0.93203s
Dingus 1:   0.530455s
Dingus 2:   0.537765s
Rasta 1:    1.973259s
TommyW 1:   0.87257s
Robartes 1: 0.996671s
PhiRatE:    0.232084s
BrowserUk:  2.764623s

edit: note that this is for 100 iterations

With the code done like this:

# BrowserUk $t0 = [gettimeofday]; #! Set up big regex. 1-time hit. my $re ='(?:(.)(??{"$+*"}))?' x 500; $re = qr/$re/o; for (1..100) { @res = browseruk($stn); } print "BrowserUk: ".tv_interval( $t0 )."\n"; sub browseruk { $_ = shift; my @c = m/$re/; #! THIS LINE DOES ALL THE WORK. #! This truncates the list to exclude null matches returned from r +egex. $#c = $#- -1; return \@c; }

I anticipated that perhaps the startup cost of the regex generation might be causing the performance problem, so I ran a 1000 unit test as well, against only my entry.

Length: 1920
PhiRatE:    2.241276s
BrowserUk: 28.384692s

Note that you got similar results, I'm running 100-1000 runs of the same line, you only did one :)

My recommendation is either go with the Inline C one if you really need the speed, or Dingus' 2nd variant, which is the cleanest, closest perl variant.

Replies are listed 'Best First'.
Re: Re: Re: Re: Re: Efficient run determination.
by BrowserUk (Pope) on Nov 15, 2002 at 11:22 UTC

    I don't agree with your timing method. Including the regex generation into the timing is wrong. I the real application, the regex would be set up once at the beginning of the program or even pregenerated and read from a file or even embedded in the source. It can then be reused over and over.

    In the application for which this is destined for use, it would be used on each of 500 x 500 char strings, and this process it repeated as fast as the data buffer can be filled with the aim of getting the whole down to less than 1 second, hence the need for speed.

    Would you consider timing the repetition loop without the regex setup which is basically a compile time cost not a runtime one? Without the start up costs, I think you'll see a different picture.

    As for the limitation on the number of matches, I used 500 as that is the limit of the string length for my application, but again this would be adjusted to need at startup.


    Okay you lot, get your wings on the left, halos on the right. It's one size fits all, and "No!", you can't have a different color.
    Pick up your cloud down the end and "Yes" if you get allocated a grey one they are a bit damp under foot, but someone has to get them.
    Get used to the wings fast cos its an 8 hour day...unless the Govenor calls for a cyclone or hurricane, in which case 16 hour shifts are mandatory.
    Just be grateful that you arrived just as the tornado season finished. Them buggers are real work.

      The cost of the startup thing is only included once, and even without it it makes no difference, the timing is the same, the startup cost is miniscule in comparison to the cost of the rest of the process.

      If you don't agree with me, feel free to run your own benchmarks, my program is included at the end of this message. Your algorithm, while interesting, is one of the slowest.

      Iterations: 100
      Length: 1920
      PhiRatE 1:  0.236097s  Perl/C
      PhiRatE 3:  0.234754s  Perl/C
      Dingus 1:   0.541398s  Perl
      PhiRatE 2:  0.543576s  Perl
      Dingus 2:   0.580746s  Perl + RE
      TommyW 1:   0.897865s  Perl + RE
      Enlil 2:    0.964746s  Perl + RE
      Robartes 1: 1.021243s  Perl
      Rasta 1:    2.015298s  Perl + RE
      BrowserUk:  2.764815s  Perl + RE
      

      Code for my benchmarking is here. Feel free to fiddle around to your liking.

      use Data::Dumper; use Time::HiRes qw( usleep ualarm gettimeofday tv_interval ); use re 'eval'; $stn = "aaaaaaammm38fdkkkkkkkk3,,,,,,,,,,sad909999999994lkllllllllllll +lz,,,,,,,,,dd888888882jk2kkd8d888d8djkjkjkjkkk3kk4k5kkkk65"; $iterations = 500; for (1..4) { $stn.=$stn; } print "Iterations: $iterations\n";~ print "Length: ".length($stn)."\n"; # Enlil 2 $t0 = [gettimeofday]; for (1..$iterations) { @res = enlil_2($stn); } print "Enlil 2: ".tv_interval( $t0 )."\n"; # Dingus 1 $t0 = [gettimeofday]; for (1..$iterations) { @res = dingus_1($stn); } print "Dingus 1: ".tv_interval( $t0 )."\n"; # Rasta 1 $t0 = [gettimeofday]; for (1..$iterations) { @res = rasta_1($stn); } print "Rasta 1: ".tv_interval( $t0 )."\n"; # TommyW 1 $t0 = [gettimeofday]; for (1..$iterations) { @res = tommyw_1($stn); } print "TommyW 1: ".tv_interval( $t0 )."\n"; # Robartes 1 $t0 = [gettimeofday]; for (1..$iterations) { @res = robartes_1($stn); } print "Robartes 1: ".tv_interval( $t0 )."\n"; # PhiRatE 1 $t0 = [gettimeofday]; for (1..$iterations) { @res = p_process($stn); } print "PhiRatE 1: ".tv_interval( $t0 )."\n"; # BrowserUk $t0 = [gettimeofday]; #! Set up big regex. 1-time hit. my $re ='(?:(.)(??{"$+*"}))?' x 500; $re = qr/$re/o; for (1..$iterations) { @res = browseruk($stn); } print "BrowserUk: ".tv_interval( $t0 )."\n"; # Dingus 2 $t0 = [gettimeofday]; for (1..$iterations) { @res = dingus_2($stn); } print "Dingus 2: ".tv_interval( $t0 )."\n"; # PhiRatE 2 $t0 = [gettimeofday]; for (1..$iterations) { @res = phirate_2($stn); } print "PhiRatE 2: ".tv_interval( $t0 )."\n"; # PhiRatE 3 $t0 = [gettimeofday]; for (1..$iterations) { @res = p_process_2($stn); } print "PhiRatE 3: ".tv_interval( $t0 )."\n"; sub browseruk { $_ = shift; my @c = m/$re/; #! THIS LINE DOES ALL THE WORK. #! This truncates the list to exclude null matches returned from r +egex. $#c = $#- -1; return \@c; } sub enlil_2 { my $string = shift; my @bah; while ($string =~ /((.)\2*)/g) { push (@bah, [$2,$-[1],$+[1] - $-[1]]); } return \@bah; } sub dingus_1 { my $string = shift; my (@res, $c, $p, $i); $p = 0; $c = substr($string,$p,1); for ($i=1; $i<length($string); $i++) { next if ($c eq substr($string,$i,1)); push (@res, [$c,$p,($i-$p)]); $c = substr($string,$i,1); $p = $i; } push (@res, [$c,$p,($i-$p)]); return \@res; } sub dingus_2 { my $string = shift; my (@res, $i); $i = 0; while ($string =~ /(.)\1*/g) { push (@res, [$1, $i, pos($string)-$i]); $i = pos($string); } return \@res; } sub rasta_1 { my $string = shift; my ($pp, $l, @res); $l = length($string); $pp = 0; while ($pp < $l) { $c = substr $string, $pp, 1; if ($string =~ /\G\Q$c\E+/gc) { push @res,[$c,$pp,pos($string) - $pp]; $pp = pos($string); } } return \@res; } sub tommyw_1 { my $string = shift; my $pos=0; my @triples=(); my @reps=$string=~/((.)\2*)/g; while (@reps) { my $hits=shift @reps; my $char=shift @reps; push @triples, [$char, $pos, length $hits]; $pos+=length $hits; } return \@triples; } sub robartes_1 { my $string = shift; my @res; my @listedstring= split//,$string; my $prev=shift @listedstring; my $currstart=my $index=0; for (@listedstring) { if ($_ eq $prev) { $index++; } else { push @res, [$prev, $currstart, $index-$currstart+1]; $currstart=++$index; $prev=$_; } } push @res, [$prev, $currstart, $index-$currstart+1]; return \@res; } sub phirate_2 { $_ = shift; my @res; my $count=0; my ($prev, $next); my $i=0; $prev = $next = chop($_); while ($next || $prev) { if ($prev eq $next) { $count++; } else { push @res,[$prev, $i=$count, $count]; $prev = $next; $count = 1; } $i++; $next = chop; } return \@res; } use Inline C => <<'END_OF_C_CODE'; void p_process(char *s) { char prev = 0; long count = 0; long pos = 0; long i=0; AV *array; Inline_Stack_Vars; Inline_Stack_Reset; while((*s != 0) || (prev != 0)) { if (count==0) { pos = i; prev = *s; count = 1; } else if (prev == *s) { count++; } else { array = newAV(); av_push(array,newSVpvn(&prev,1)); av_push(array,newSViv(pos)); av_push(array,newSViv(count)); Inline_Stack_Push(newRV_inc(array)); pos=i; prev = *s; count=1; } i++; s++; } Inline_Stack_Done; } void p_process_2(char *s) { char prev = 0; long count = 0; long i=0; AV *array; Inline_Stack_Vars; Inline_Stack_Reset; prev = *s; while((*s != 0) || (prev != 0)) { if (prev == *s) { count++; } else { array = newAV(); av_push(array,newSVpvn(&prev,1)); av_push(array,newSViv(i-count)); av_push(array,newSViv(count)); Inline_Stack_Push(newRV_inc(array)); prev = *s; count=1; } i++; s++; } Inline_Stack_Done; } END_OF_C_CODE

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://213110]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2019-05-26 20:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you enjoy 3D movies?



    Results (153 votes). Check out past polls.

    Notices?
    • (Sep 10, 2018 at 22:53 UTC) Welcome new users!