Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Perl slower than java

by Christian888 (Acolyte)
on Dec 08, 2010 at 21:53 UTC ( #876119=perlquestion: print w/ replies, xml ) Need Help??
Christian888 has asked for the wisdom of the Perl Monks concerning the following question:

I'm new to perl, and I find the language very pleasant. (In fact I love it.) I was eager to try it on something more serious than “hello world” so I wrote a little genetic algorithm solver (code is below and the explanation at http://www.ai-junkie.com/ga/intro/gat1.html). I was pleasantly surprised with conciseness of the code, but appalled by how slow the programme ran. Some time ago I wrote the same thing in java. Perl seems to run about 5 times slower, despite having only a fraction of the complexity of the java version. I wonder if I’ve done something wrong in installing my interpreter, or if there are some bugs in the code that make it so slow. Thanks for helping…

use strict; use warnings; use Math::Complex; use Win32; my $target = 100; my $population_size = 50; my $chromosome_size = 100; my $mutation_rate = 0.1; my $crossover_rate = 0.7; my @population = (); my %genome = ( '0001' => '1', '0010' => '2', '0011' => '3', '0100' => '4', '0101' => '5', '0110' => '6', '0111' => '7', '1000' => '8', '1001' => '9', '1010' => '+', '1011' => '-', '1100' => '*', '1101' => '/' ); my $duration; initialise_population(); my $generation_counter = 1; my $winner = check_for_winner(); until ($winner) { print( "generation $generation_counter time: " . Win32::GetTickCount +. " \n" ); @population = regenrate_population(); $winner = check_for_winner(); $generation_counter++; } print("Solution reached in generation: $generation_counter \n"); print("The chromosome: @$winner \n"); print( get_phenotype($winner) . " = " . get_result($winner) . " \n" ); sub regenrate_population { my @new_population = (); for my $i ( 0 .. ( $population_size - 1 ) ) { my $chromosome1 = get_nonrandom_chromosome(); my $chromosome2 = get_nonrandom_chromosome(); $new_population[$i] = get_child( $chromosome1, $chromosome2 ); } return (@new_population); } sub check_for_winner { my $winner; foreach my $chromosome (@population) { if ( get_result($chromosome) == $target ) { $winner = $chromosome; last; } } return ($winner); } sub get_nonrandom_chromosome { my $population_fitness_score = get_population_fitness_score(); my $rulet_position = rand($population_fitness_score); my $temp_score = 0; my $nonrandom_chromosome; foreach my $chromosome (@population) { $temp_score += get_fitness_score($chromosome); if ( $temp_score > $rulet_position ) { $nonrandom_chromosome = $chromosome; last; } } return ($nonrandom_chromosome); } sub get_child { my $chromosome1 = shift(@_); my $chromosome2 = shift(@_); my $new_chromosome; if ( rand(1) < $crossover_rate ) { my $crossover_point = int( rand($chromosome_size) ); for my $i ( 0 .. ( $chromosome_size - 1 ) ) { if ( $i < $crossover_point ) { push( @$new_chromosome, @$chromosome1[$i] ); } else { push( @$new_chromosome, @$chromosome2[$i] ); } } } else { if ( rand(1) > 0.5 ) { $new_chromosome = $chromosome1; } else { $new_chromosome = $chromosome2; } } if ( rand(1) < $mutation_rate ) { my $nucleotide_pos = int( rand($chromosome_size) ); if ( $$new_chromosome[$nucleotide_pos] ) { $$new_chromosome[$nucleotide_pos] = 0; } else { $$new_chromosome[$nucleotide_pos] = 1; } } return ($new_chromosome); } sub get_population_fitness_score { my $population_fitness_score = 0; foreach my $chromosome (@population) { $population_fitness_score += get_fitness_score($chromosome); } return ($population_fitness_score); } sub get_result { my $chromosome = shift(@_); my $phenotype = get_phenotype($chromosome); my $result = eval($phenotype); return ($result); } sub get_fitness_score { my $chromosome = shift(@_); my $result = get_result($chromosome); my $fitness_score = ( $target / ( $target + abs( $target - $result ) ) ) * 3; } sub initialise_population { for my $chromosome ( 0 .. $population_size ) { for my $nucleotide ( 0 .. $chromosome_size ) { $population[$chromosome]->[$nucleotide] = int( rand(1) + 0 +.5 ); } } } sub get_phenotype { my $chromosome = shift(@_); my @phenotype = (); my @expressed_phenotype = (); my $gene_length = gene_length(); my @nucleotides = @$chromosome; my @gene = (); my $pattern = q(\d); foreach my $nucleotide (@nucleotides) { if ( scalar(@gene) >= $gene_length ) { my $gene = join( "", @gene ); @gene = ($nucleotide); if ( defined( $genome{$gene} ) ) { push( @phenotype, $genome{$gene} ); } } else { push( @gene, $nucleotide ); } } foreach my $item (@phenotype) { if ( $item =~ m/$pattern/ ) { push( @expressed_phenotype, $item ); if ( $pattern eq q(\d) ) { $pattern = q(\D); } else { $pattern = q(\d); } } } if ( $expressed_phenotype[$#expressed_phenotype] =~ m/\D/ ) { pop(@expressed_phenotype); } return ( join( "", @expressed_phenotype ) ); } sub gene_length { my @gls = (); foreach my $key ( keys(%genome) ) { push( @gls, length($key) ); } @gls = sort(@gls); if ( $gls[0] != $gls[$#gls] ) { die("Invalid genotype"); } return ( $gls[0] ); }

Comment on Perl slower than java
Download Code
Re: Perl slower than java
by SimonClinch (Chaplain) on Dec 08, 2010 at 22:24 UTC
    If the algorithm is exactly duplicated between Perl and Java, yes, Java is likely to win but C would beat them both. BUT the whole point of Perl is that it lends itself to creating more efficient algorithms easily and so much so that when approached properly, even C is likely to lose unless the C program is massive enough in its complexity to duplicate the terser optimal Perl algorithm. In this case, you are effectively processing all permutations instead of using any kind of sorting efficiency. Try to think of a way to represent the data so that as many losing iterations as possible simply aren't processed by an improved algorithm.

    One world, one people

      Processing-wise, my java equivalent is at least as inefficient as the perl code. I wonder if there is anything in the setting of the interpreter that may be slowing down the execution?

        perl >= 5.10 has gotten pretty bulky compared to older versions.

        I'm a bit curious as to how fast the newest ruby and python run the same algorithms.

        I bet if you would run the same code under perl 5.6 it would run a lot faster (the C data structures behind perl 5.6's variables are a lot more light weight). At least based on a very informal test we did here: Re^3: Why is this code so much slower than the same algorithm in C?.

Re: Perl slower than java
by Anonymous Monk on Dec 08, 2010 at 22:28 UTC
    That Perl smells a little like Java.

    For example, for some of those smaller functions, I'd do this:

    sub gene_length { my @gls = sort map { length } keys %genome; die "Invalid genotype" unless $gls[0] == $gls[$#gls]; return $gls[0]; }

    And this:

    use List::Util 'first'; sub check_for_winner { return first { get_result($_) == $target } @population }

    And this;

    use List::Util 'sum'; sub get_population_fitness_score { return sum map { get_fitness_score($_) } @population; }

    (Or just inline those statements, if they're only used once.)

    In other words, when you can trim it down so that you can actually tell what it's doing at a glance, then it's easier to benchmark!.

      That Perl smells like LISP ;)
        Well, I've never used Lisp, but I did stay at a Holiday Inn last night read Higher-Order Perl, so I'll take that as a compliment.
        My perl smells like a beginner’s fear of writing a code that he will not be able to read again in six months time :-) Hence I did everything as explicitly as possible, including all parentheses. I was not aware that being explicit would slow the code in perl. Is this correct? BTW, the slowest sub seems to be “get_phenotype” I think it take up to 25 milliseconds on my machine, which is a lot compared to the rest of the code. Can you see anything that might be causing the slowdown in that sub?
Re: Perl slower than java
by jwkrahn (Monsignor) on Dec 08, 2010 at 23:21 UTC
    use Math::Complex;

    It doesn't look like this module is used anywhere.    Is it actually needed?



    use Win32; ... "generation $generation_counter time: " . Win32::GetTickCount +. " \n" );

    Perhaps you could use Time::HiRes and make your code more portable.



    my $gene_length = gene_length(); ... if ( defined( $genome{$gene} ) ) { push( @phenotype, $genome{$gene} ); ... sub gene_length { my @gls = (); foreach my $key ( keys(%genome) ) { push( @gls, length($key) ); } @gls = sort(@gls); if ( $gls[0] != $gls[$#gls] ) { die("Invalid genotype"); } return ( $gls[0] ); }

    You never modify the %genome hash and the keys are always the same size so you don't really need the gene_length subroutine.

    If you wanted to you could just do:

    my $gene_length = length +( keys %genome )[ 0 ];


    Some of the places where you use arrays could be made more efficient by using strings instead.

Re: Perl slower than java
by Khen1950fx (Canon) on Dec 08, 2010 at 23:38 UTC
    I ran your code as is and reached a solution in generation 30. Using Linux, I added Memoize, and it reached a solution in generation 20. Does that help?
    #!/usr/bin/perl use strict; use warnings; use Math::Complex; use Memoize; memoize( 'initialise_population', 'regenrate_population', 'check_for_winner', 'get_nonrandom_chromosome', 'get_child', 'get_population_fitness_score', 'get_result', 'get_fitness_score', 'get_phenotype', 'gene_length'); my $target = 100; my $population_size = 50; my $chromosome_size = 100; my $mutation_rate = 0.1; my $crossover_rate = 0.7; my @population = (); my %genome = ( '0001' => '1', '0010' => '2', '0011' => '3', '0100' => '4', '0101' => '5', '0110' => '6', '0111' => '7', '1000' => '8', '1001' => '9', '1010' => '+', '1011' => '-', '1100' => '*', '1101' => '/' ); my $duration; initialise_population(); my $generation_counter = 1; my $winner = check_for_winner(); until ($winner) { print( "generation $generation_counter time: \n" ); @population = regenrate_population(); $winner = check_for_winner(); $generation_counter++; } print("Solution reached in generation: $generation_counter \n"); print("The chromosome: @$winner \n"); print( get_phenotype($winner) . " = " . get_result($winner) . " \n" ); sub regenrate_population { my @new_population = (); for my $i ( 0 .. ( $population_size - 1 ) ) { my $chromosome1 = get_nonrandom_chromosome(); my $chromosome2 = get_nonrandom_chromosome(); $new_population[$i] = get_child( $chromosome1, $chromosome2 ); } return (@new_population); } sub check_for_winner { my $winner; foreach my $chromosome (@population) { if ( get_result($chromosome) == $target ) { $winner = $chromosome; last; } } return ($winner); } sub get_nonrandom_chromosome { my $population_fitness_score = get_population_fitness_score(); my $rulet_position = rand($population_fitness_score); my $temp_score = 0; my $nonrandom_chromosome; foreach my $chromosome (@population) { $temp_score += get_fitness_score($chromosome); if ( $temp_score > $rulet_position ) { $nonrandom_chromosome = $chromosome; last; } } return ($nonrandom_chromosome); } sub get_child { my $chromosome1 = shift(@_); my $chromosome2 = shift(@_); my $new_chromosome; if ( rand(1) < $crossover_rate ) { my $crossover_point = int( rand($chromosome_size) ); for my $i ( 0 .. ( $chromosome_size - 1 ) ) { if ( $i < $crossover_point ) { push( @$new_chromosome, @$chromosome1[$i] ); } else { push( @$new_chromosome, @$chromosome2[$i] ); } } } else { if ( rand(1) > 0.5 ) { $new_chromosome = $chromosome1; } else { $new_chromosome = $chromosome2; } } if ( rand(1) < $mutation_rate ) { my $nucleotide_pos = int( rand($chromosome_size) ); if ( $$new_chromosome[$nucleotide_pos] ) { $$new_chromosome[$nucleotide_pos] = 0; } else { $$new_chromosome[$nucleotide_pos] = 1; } } return ($new_chromosome); } sub get_population_fitness_score { my $population_fitness_score = 0; foreach my $chromosome (@population) { $population_fitness_score += get_fitness_score($chromosome); } return ($population_fitness_score); } sub get_result { my $chromosome = shift(@_); my $phenotype = get_phenotype($chromosome); my $result = eval($phenotype); return ($result); } sub get_fitness_score { my $chromosome = shift(@_); my $result = get_result($chromosome); my $fitness_score = ( $target / ( $target + abs( $target - $result ) ) ) * 3; } sub initialise_population { for my $chromosome ( 0 .. $population_size ) { for my $nucleotide ( 0 .. $chromosome_size ) { $population[$chromosome]->[$nucleotide] = int( rand(1) + 0 +.5 ); } } } sub get_phenotype { my $chromosome = shift(@_); my @phenotype = (); my @expressed_phenotype = (); my $gene_length = gene_length(); my @nucleotides = @$chromosome; my @gene = (); my $pattern = q(\d); foreach my $nucleotide (@nucleotides) { if ( scalar(@gene) >= $gene_length ) { my $gene = join( "", @gene ); @gene = ($nucleotide); if ( defined( $genome{$gene} ) ) { push( @phenotype, $genome{$gene} ); } } else { push( @gene, $nucleotide ); } } foreach my $item (@phenotype) { if ( $item =~ m/$pattern/ ) { push( @expressed_phenotype, $item ); if ( $pattern eq q(\d) ) { $pattern = q(\D); } else { $pattern = q(\d); } } } if ( $expressed_phenotype[$#expressed_phenotype] =~ m/\D/ ) { pop(@expressed_phenotype); } return ( join( "", @expressed_phenotype ) ); } sub gene_length { my @gls = (); foreach my $key ( keys(%genome) ) { push( @gls, length($key) ); } @gls = sort(@gls); if ( $gls[0] != $gls[$#gls] ) { die("Invalid genotype"); } return ( $gls[0] ); }
Re: Perl slower than java
by JavaFan (Canon) on Dec 09, 2010 at 00:42 UTC
    So, you're new to Perl, and for your first non-trivial program you wrote, you ended up with a solution that's slower than a Java implementation you did. I presume you aren't as new to Java as to Perl.

    Don't you think that's a very small bases to conclude Java is faster than Perl?

Re: Perl slower than java (perl 12x faster than Java)
by BrowserUk (Pope) on Dec 09, 2010 at 02:50 UTC

    This produces the same results as your original, but 66 times faster. Which should make it about 12 times faster than your Java code.

    Of course, if you fold many of the optimisations--which mostly come down to not doing the same thing multiple times--back into the Java code, that would run more quickly too. Iffy algorithms are iffy in any language.

    #! perl -slw use strict; use List::Util qw[ sum ]; use Time::HiRes qw[ time ]; srand( 100 ); my $target = 100; my $population_size = 50; my $chromosome_size = 100; my $mutation_rate = 0.1; my $crossover_rate = 0.7; my @population = (); my %genome = ( '0001' => '1', '0010' => '2', '0011' => '3', '0100' => '4', '0101' + => '5', '0110' => '6', '0111' => '7', '1000' => '8', '1001' => '9', '1010' => '+', '1011' => '-', '1100' => '*', '1101' => '/' ); my $gene_length = gene_length(); my $start = time; initialise_population(); my $generation_counter = 1; my $winner = check_for_winner(); until ($winner) { @population = regenrate_population(); $winner = check_for_winner(); $generation_counter++; } print("Solution reached in generation: $generation_counter"); print("The chromosome: @$winner"); print( get_phenotype($winner, $gene_length) . " = " . get_result($winn +er) ); printf "Took %.3f seconds\n", time()-$start; exit; sub regenrate_population { my @new_population = (); for ( 0 .. ( $population_size - 1 ) ) { $new_population[$_] = get_child( get_nonrandom_chromosome(), get_nonrandom_chromosome() ); } return (@new_population); } sub get_child { my( $chromosome1, $chromosome2 ) = @_; my $new_chromosome; if ( rand(1) < $crossover_rate ) { my $crossover_point = int( rand($chromosome_size) ); $new_chromosome = [ @{$chromosome1}[0..$crossover_point-1], @{$chromosome2}[$crossover_point..$chromosome_size-1], ]; } else { $new_chromosome = rand(1) > 0.5 ? $chromosome1 : $chromosome2 +; } $$new_chromosome[ rand($chromosome_size) ] ^= 0 if rand(1) < $muta +tion_rate; return $new_chromosome; } sub check_for_winner { get_result($_) == $target and return $_ for @population } sub get_fitness_score { my $chromosome = shift(@_); my $result = get_result($chromosome); my $fitness_score = ( $target / ( $target + abs( $target - $result ) ) ) * 3; } sub get_nonrandom_chromosome { my @scores = map { get_fitness_score($_) } @population; my $rulet_position = rand( sum @scores ); my $temp_score = 0; foreach my $i ( 0 .. $#scores ) { $temp_score += $scores[ $i ]; return $population[ $i ] if $temp_score > $rulet_position; } } my %memo; sub get_result { return $memo{ $_[0] } //= evalExpr( get_phenotype( $_[0], $gene_le +ngth ) ); } sub initialise_population { for my $chromosome ( 0 .. $population_size ) { for my $nucleotide ( 0 .. $chromosome_size ) { $population[$chromosome]->[$nucleotide] = int( rand(1) + 0 +.5 ); } } } sub get_phenotype { my( $chromosome, $len ) = @_; my $ep = join'', map { $genome{ $_ } // ''; } unpack "(a$len)*", join'', @$chromosome; $ep =~ s[^\D+][]; $ep =~ s[(\d)(\d+)][$1]g; $ep =~ s[(\D)(\D+)][$1]g; $ep =~ s[\D+$][]; return $ep; } sub gene_length { my $len = length( each %genome ); while( my $key = each %genome ) { die "Invalid genotype" unless length( $key ) == $len; } return $len; } sub evalExpr { local $_ = shift; s[(?<=[^*/+-])([*/+-])][<$1>]g; 1 while s[([^>]+)<([*/])>([^<]+)][$2 eq '*' ? $1 * $3 : $1 / $3]e +; 1 while s[([^>]+)<([+-])>([^<]+)][$2 eq '+' ? $1 + $3 : $1 - $3]e +; return $_; } __END__ Solution reached in generation: 5 The chromosome: 0 0 0 0 1 1 0 1 1 1 1 0 1 0 0 1 0 0 1 1 0 1 1 1 1 1 0 +0 0 1 0 0 0 0 0 1 1 0 0 1 1 1 1 1 0 0 1 0 1 0 0 1 0 1 1 0 1 1 0 0 0 0 + 1 1 1 0 0 0 0 0 0 1 1 0 1 1 1 0 0 0 1 1 1 0 1 1 0 1 1 1 0 0 0 0 0 0 +1 0 1 0 9*4*3-8 = 100 Took 7.711 seconds Solution reached in generation: 5 The chromosome: 0 0 0 0 1 1 0 1 1 1 1 0 1 0 0 1 0 0 1 1 0 1 1 1 1 1 0 +0 0 1 0 0 0 0 0 1 1 0 0 1 1 1 1 1 0 0 1 0 1 0 0 1 0 1 1 0 1 1 0 0 0 0 + 1 1 1 0 0 0 0 0 0 1 1 0 1 1 1 0 0 0 1 1 1 0 1 1 0 1 1 1 0 0 0 0 0 0 +1 0 1 0 9*4*3-8 = 100 Took 0.115 seconds

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      I could not follow all of your code, but it seems it always produces the same solution, which may mean that it does not do what it is supposed to.

      Anyhow, taking on all the advice I got here (I have to admit I'm impressed with the answers I got - Thanks guys) I optimised the code and the current version runs over 100 times faster than the first one!!! I am very pleased with it actually :-)

      Having said that, I just cannot see why would one generation in the first version take about 2.5 seconds (on my machine) and the current one only 25 milliseconds. It doesn't seem that the new version does 100 times less calculation.

      It would be nice if someone opened my eyes. Anyway... Here is the latest optimised code. Thanks

      Christian

      use strict; use warnings; use List::Util qw[ sum ]; use Time::HiRes qw[ time ]; #use Math::Complex; #use Win32; my $total_time = time(); my $target = 100; my $population_size = 100; my $chromosome_size = 100; my $mutation_rate = 0.1; my $crossover_rate = 0.7; my @population = (); my @phenotypes = (); my @results = (); my %genome = ( '0001' => '1', '0010' => '2', '0011' => '3', '0100' => '4', '0101' => '5', '0110' => '6', '0111' => '7', '1000' => '8', '1001' => '9', '1010' => '+', '1011' => '-', '1100' => '*', '1101' => '/' ); my $gene_length = get_gene_length( \%genome ); my $duration; my $generation_counter = 1; my $winner; do { my $generation_time = time(); @population = @{ regenrate_population( \@population ) }; $winner = check_for_winner( \@population ); printf( "Generation $generation_counter took %.3f seconds to compl +ete\n", time() - $generation_time ); $generation_counter++; } until ($winner); print("Solution reached in generation: $generation_counter \n"); print("The chromosome: @{$winner->{chromosome}} \n"); print( $winner->{phenotype} . " = " . $winner->{result} . " \n" ); printf( "Total time to reach solution %.3f seconds \n", time() +- $total_time ); sub regenrate_population { my $old_population = shift(@_); my $new_population = []; if ( $old_population->[0] ) { my $population_fitness_score=get_population_fitness_score($old +_population); for my $individual ( 0 .. ($population_size) ) { my $chromosome1 = get_nonrandom_chromosome($old_population +,$population_fitness_score); my $chromosome2 = get_nonrandom_chromosome($old_population +,$population_fitness_score); $new_population->[$individual]->{chromosome} = get_child( $chromosome1, $chromosome2 ); $new_population->[$individual]->{phenotype} = get_phenotype( $new_population->[$individual]->{chromoso +me} ); $new_population->[$individual]->{result} = get_result( $new_population->[$individual]->{phenotype} +); $new_population->[$individual]->{fitness_score} = get_fitness_score( $new_population->[$individual]->{resu +lt} ); } } else { for my $individual ( 0 .. $population_size ) { for my $nucleotide ( 0 .. $chromosome_size ) { $new_population->[$individual]->{chromosome}->[$nucleo +tide] = int( rand(1) + 0.5 ); } $new_population->[$individual]->{phenotype} = get_phenotype( $new_population->[$individual]->{chromoso +me} ); $new_population->[$individual]->{result} = get_result( $new_population->[$individual]->{phenotype} +); $new_population->[$individual]->{fitness_score} = get_fitness_score( $new_population->[$individual]->{resu +lt} ); } } return ($new_population); } sub check_for_winner { my $winner; my $population = shift(@_); foreach my $individual (@$population) { if ( $individual->{result} == $target ) { $winner = $individual; last; } } return ($winner); } sub get_nonrandom_chromosome { my $population=shift(@_); my $population_fitness_score = shift(@_); my $rulet_position = rand($population_fitness_score); my $temp_score = 0; my $nonrandom_chromosome; foreach my $individual (@$population) { $temp_score += $individual->{fitness_score}; if ( $temp_score > $rulet_position ) { $nonrandom_chromosome = $individual->{chromosome}; last; } } return ($nonrandom_chromosome); } sub get_child { my $chromosome1 = shift(@_); my $chromosome2 = shift(@_); my $new_chromosome; if ( rand(1) < $crossover_rate ) { my $crossover_point = int( rand($chromosome_size) ); $new_chromosome = [ @$chromosome1[ 0 .. $crossover_point ], @$chromosome2[ ( $crossover_point + 1 ) .. ($chromosome_si +ze) ] ]; } else { if ( rand(1) > 0.5 ) { $new_chromosome = $chromosome1; } else { $new_chromosome = $chromosome2; } } if ( rand(1) < $mutation_rate ) { my $nucleotide_pos = int( rand($chromosome_size) ); if ( $$new_chromosome[$nucleotide_pos] ) { $$new_chromosome[$nucleotide_pos] = 0; } else { $$new_chromosome[$nucleotide_pos] = 1; } } return ($new_chromosome); } sub get_population_fitness_score { my $population = shift(@_); my $population_fitness_score = sum( map { $_->{fitness_score} } (@$population) ); return ($population_fitness_score); } sub get_result { my $phenotype = shift(@_); my $result = eval($phenotype) ; return ($result); } sub get_fitness_score { my $result = shift(@_); my $fitness_score = ( $target / ( $target + abs( $target - $result ) ) ) * 3; return ($fitness_score); } sub get_phenotype { my $chromosome = shift(@_); my @phenotype = (); my @expressed_phenotype = (); my @nucleotides = @$chromosome; my @gene = (); my $pattern = q(\d); foreach my $nucleotide (@nucleotides) { if ( scalar(@gene) >= $gene_length ) { my $gene = join( "", @gene ); @gene = ($nucleotide); if ( defined( $genome{$gene} ) ) { push( @phenotype, $genome{$gene} ); } } else { push( @gene, $nucleotide ); } } foreach my $item (@phenotype) { if ( $item =~ m/$pattern/ ) { push( @expressed_phenotype, $item ); if ( $pattern eq q(\d) ) { $pattern = q(\D); } else { $pattern = q(\d); } } } if ( $expressed_phenotype[$#expressed_phenotype] =~ m/\D/ ) { pop(@expressed_phenotype); } return ( join( "", @expressed_phenotype ) ); } sub get_gene_length { my $genome = shift(@_); my @gls = (); foreach my $key ( keys(%$genome) ) { push( @gls, length($key) ); } @gls = sort(@gls); if ( $gls[0] != $gls[$#gls] ) { die("Invalid genotype"); } return ( $gls[0] ); }
        but it seems it always produces the same solution, which may mean that it does not do what it is supposed to.

        Hm. At the top of the code I posted you'll see a line srand(100);, this seeds the random number generator so that it will always produce the same sequence of numbers. It's purpose is to cause the program to follow the same sequence of steps each run so that modifications to code can be compared for accuracy and timing.

        The very essence of optimisation process is that you must ensure a) that you don't break anything; b) that you are comparing like with like. Artificially forcing the program to go through the same steps each run does that.

        Once tested and demonstrated, simply delete that line and it will generate truly random solutions each time per your original. I left it in so you could test the accuracy and performance for yourself.

        Having said that, I just cannot see why would one generation in the first version take about 2.5 seconds (on my machine) and the current one only 25 milliseconds. It doesn't seem that the new version does 100 times less calculation.

        For the most part, that is exactly what my optimisations do. They simple avoid doing the same calculation more than once.

        As an example, in your get_phenotype() sub you have:

        my $gene_length = gene_length();

        which means that you call this subroutine over and over:

        sub gene_length { my @gls = (); foreach my $key ( keys(%genome) ) { push( @gls, length($key) ); } @gls = sort(@gls); if ( $gls[0] != $gls[$#gls] ) { die("Invalid genotype"); } return ( $gls[0] ); }

        Which iterates %genome pushing the lengths of its keys to an array; then sorts those lengths; then checks that the first and last are the same.

        In my version of gen_length(), I avoid this O(n)+O(n log n) process by comparing the length of the first key to each of the others, which is just O(n):

        sub gene_length { my $len = length( each %genome ); while( my $key = each %genome ) { die "Invalid genotype" unless length( $key ) == $len; } return $len; }

        But far more significantly, as the lengths of the keys in %genome never vary during the life of the program, I call it only once at the start of the program:

        my %genome = ( '0001' => '1', '0010' => '2', '0011' => '3', '0100' => '4', '0101' + => '5', '0110' => '6', '0111' => '7', '1000' => '8', '1001' => '9', '1010' => '+', '1011' => '-', '1100' => '*', '1101' => '/' ); my $gene_length = gene_length();

        As get_phenotype() is called many thousands of time, you are repeating the get_length() processing many thousands of times for no purpose. And your original code is rife with similar things.

        Another example:

        sub get_nonrandom_chromosome { ### get_population_fitness_score() ### ### calculates the score for every chromosome ### my $population_fitness_score = get_population_fitness_score(); my $rulet_position = rand($population_fitness_score); my $temp_score = 0; my $nonrandom_chromosome; foreach my $chromosome (@population) { ### you then REcalculate them individually here ### $temp_score += get_fitness_score($chromosome); if ( $temp_score > $rulet_position ) { $nonrandom_chromosome = $chromosome; last; } } return ($nonrandom_chromosome); }

        In my version, I only calculate the scores once and reuse them:

        sub get_nonrandom_chromosome { my @scores = map { get_fitness_score($_) } @population; my $rulet_position = rand( sum @scores ); my $temp_score = 0; foreach my $i ( 0 .. $#scores ) { $temp_score += $scores[ $i ]; return $population[ $i ] if $temp_score > $rulet_position; } }

        But the single biggest saving comes in get_result():

        my %memo; sub get_result { return $memo{ $_[0] } //= evalExpr( get_phenotype( $_[0], $gene_le +ngth ) ); }

        For each iteration of the main until( $winner ) loop, most of the chromosomes in the population will not have changed, but your algorithm calls for their results to be recalculated each time.

        The use of %memo above avoids the costly process of reconstructing and evaling the phenotype over and over by storing it the first time it is calculated and returning the stored value whenever it is called for again. I also dodged the notoriously slow eval code by using my own simple four function expression evaluator:

        sub evalExpr { local $_ = shift; s[(?<=[^*/+-])([*/+-])][<$1>]g; 1 while s[([^>]+)<([*/])>([^<]+)][$2 eq '*' ? $1 * $3 : $1 / $3]e +; 1 while s[([^>]+)<([+-])>([^<]+)][$2 eq '+' ? $1 + $3 : $1 - $3]e +; return $_; }

        This is less efficient than it could be because it was necessary to follow the same rules of precedence that Perl's eval does in order to ensure my code produced the same results as your original.

        If however, per the web page linked from your OP, the operations were evaluated in strict left to right order, that could be rather more efficient than it currently is. But, from experience, if an optimised solution doesn't produce exactly the same results as the original, even if the original is in error according to the spec., then the optimisations will be dismissed.

        And that hides another limitation I imposed on myself. In order to ensure that the output of my version matched yours, it was necessary to ensure that rand was called exactly the same number of times in the same sequence. This prevented me employing many further optimisations.

        For example. You've opted to store and manipulate your chromosomes as strings of ascii 0s and 1s. This is easy to work with, but uses a full byte per bit. Moving to using bit strings opens up a whole gamut of further optimisations--both space and time--that add up to another order of magnitude, perhaps two, of performance. But as you seem happy with what you now have, perhaps I'll get around to posting that as a meditation some day.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Perl slower than java
by roboticus (Canon) on Dec 09, 2010 at 03:24 UTC

    Christian888:

    I haven't analyzed your code in any depth (time constraints, lack of sleep, too many interactions with global variables). But it does appear that you're not taking advantage of some of the advantages of perl to get things going a bit faster.

    A couple random notes:

    • Upon a cursory review, it appears like you do quite a few unnecessary array copies. It may help if you try passing them around by reference instead of by value. For example:
      $population = regenrate_population(); ... sub regenrate_population { my $new_population = []; for my $i ( 0 .. ( $population_size - 1 ) ) { my $chromosome1 = get_nonrandom_chromosome(); my $chromosome2 = get_nonrandom_chromosome(); $$new_population[$i] = get_child( $chromosome1, $chromosome2 ) +; } return $new_population; }
    • You do a bunch of looping instead of taking advantage of list operations, like array slices. Consider the difference between:
      my $crossover_point = int( rand($chromosome_size) ); for my $i ( 0 .. ( $chromosome_size - 1 ) ) { if ( $i < $crossover_point ) { push( @$new_chromosome, @$chromosome1[$i] ); } else { push( @$new_chromosome, @$chromosome2[$i] ); } }
      and:
      my $crossover_point = int( rand($chromosome_size) ); $new_chromosome = [ @{$chromosome1}[ 0 .. $crossover_point-1], @{$chromosome2}[ $crossover_point .. $#{$chromosome2} ] ];
    • You can take advantage of the malleability of perls data structures to prevent recalculations, too, but I'm having trouble thinking of a good way to say it. I'll try again in the morning, if time allows. (Short form: rather than store population as a list of scalars (chromosomes), store the population as a list of array references, [ chromosome, fitness score, etc ]. Then when you call get_population_fitness_score, you can populate the fitness score slot, and not recompute it again until the next pass.
    • I don't know that anyone claims that perl is all that fast, anyway. We don't use it because the final product is all that fast, but because it lets us get the final product quickly. That said, I rarely find the speed of the final product to be noticeable for my purposes.
    • Finally, if you're concerned about speed, then you should definitely start playing with Benchmark so you can tell the difference in speed between various code constructs, and with a profiler so you can find out which bits of your program are consuming all the time. (I think it's Devel::Profile, but I'm not certain, as I rarely profile my code.)

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

Re: Perl slower than java
by dHarry (Abbot) on Dec 09, 2010 at 11:09 UTC

    I hate to disappoint you but Java is pretty fast nowadays. Many clever optimizations have improved the performance of the Java Virtual Machine. Although startup is typically slower than other languages and Java programs tend to be more memory hungry many benchmarks show that for many types of tasks the performance is close to that of C++. So although you can optimize/improve your perl script, in ways others have suggested, I doubt you can really beat the Java implementation in terms of execution speed. Also, it's not unthinkable your java program can't be optimized. If you really want speed C is probably the best choice (unless you like assembler;). I typically use Perl to prototype solutions because I can do that (much) faster compared to other languages. Normally I re-implement in Java and when speed is essential in C.

    Cheers

    Harry

    PS I'm working on CSP's and looking into Genetic Algorithms for solving them, well, approaching a decent solution in less time than other algorithms take. I know of several open source Java implementations, I wonder why you wrote it yourself? Would you care to share your code?

      Even with the fastest possible machine and language, performance-aware algorithm design usually makes more difference. To take an extreme case, I once rewrote a C-program (which is faster than C++) in Perl and improved the performance by a factor of 60. To re-implement my algorithm in C to try to go even faster (theoretically possible of course) would however have meant an enormous investment in rewriting a C-PAN module and a lot of Perls guts in pure C to support the better algorithm that Perl had inspired.

      Update: this took place in ING Baring's market data department many years ago - the case was the calculation of correlation co-efficients, from N market prices, generating MxN(N-1)/2 values for M times at which prices where snapshot. The C-program did 2*M*N*N iterations through home-grown correlation coding. The Perl program used a CPAN module iterating it only N(N-1)/2 times with a few optimisations thrown in of the kind we've discussed already.

      One world, one people

        performance-aware algorithm design

        I fulle agree, however sometimes there is not a lot you can do. Or sometimes the cost is less important/irrelevant, only speed matters. To give an example, I'm working on over-constrained CSP's and the algorithms are al at least NPC. Speeds is very important. We will even settle for sub-optimal solutions if we can speed things up. This is exlactly the reason why I've been looking into genetic algorithms.

        If you can improve the performance of a C-program by a factor 60 by rewriting it in Perl I would argue it was a poor implementation in C in the first place. I'm a big fan of Perl but I'm not religious when it comes to programming, when I need speed there are normally better options available.

        Cheers

        Harry

Re: Perl slower than java
by sundialsvc4 (Abbot) on Dec 09, 2010 at 20:30 UTC
    Duplicate... oops.
Re: Perl slower than java
by sundialsvc4 (Abbot) on Dec 09, 2010 at 20:32 UTC

    The relative speed of various languages depends much more on what is being done with them, and how the programs are written, than the inherent “speed of” the language implementation itself.   If you are moving lots of data around, for example, page-faults are likely to bite you no matter how you write it.   And, if you are doing a totally CPU-intensive gut-busting operation (likewise for a very I/O-intensive one...), it’s not likely to be too pretty.   Where all of these languages shine is when they’re able to give you instant access to well-designed hashes, lists, arrays, memory-management and so on through a tiny amount of actual source-code.   Bang for your buck.

Re: Perl slower than java
by Anonymous Monk on Dec 10, 2010 at 04:42 UTC

    While choosing a tool to do something, You must first figure out what you want to achieve.

    If I have 10 hours to cover a distance of 100 miles, I can do that using a small motorcycle easily. It makes zero sense for me to use a Ferrari instead.

    Using a swiss army chainsaw where you needed to use butchers knife and then complaining that you couldn't cut meat faster has no meaning.At the same time you cannot carry butchers knife everywhere and use it in all versatile situations as the swiss army chainsaw can be used.

Re: Perl slower than java
by wazoox (Prior) on Dec 10, 2010 at 13:13 UTC

    Choosing the right tool for the job, a digest.

    If your application is CPU-bound, Perl will certainly be quite a poor fit, and be beaten by C (obviously) and most of the time, Java.

    If your application is IO-bound (many are, be it network or disk IO), most languages will perform about the same.

    In your case, most naive perl implementations will be slower than Java, being CPU-bound. However, the win may be in the development and maintenance time; after all if your program runs in 30 ms instead of 5 ms, that hardly makes any difference to you.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://876119]
Approved by ww
Front-paged by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (8)
As of 2014-12-28 19:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (182 votes), past polls