Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Benchmarking Simple Perl program vs. Java equivalent

by roibrodo (Sexton)
on Jun 18, 2010 at 13:52 UTC ( #845380=perlquestion: print w/ replies, xml ) Need Help??
roibrodo has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

As a continue to some responses from a previous thread (Some code optimization and thanks again, everybody, despite the criticism - no hard feelings :)) I decided to try rewrite a small piece of prel code in java, in order to get some feeling about the performance differences.

I wrote the java equivalent real quickly, and it's not so "beautiful", but I think it works OK.

The results:
perl: 20.8 seconds
java: 3.9 seconds

Here is the perl code:

use strict; use warnings; use List::Util qw(max min); use Time::HiRes qw(time); # this builds a structure that is usually retrieved from disk. # in this example we will use this structure again and again, # but in the real program we obviously retrieve a fresh structure # at each iteration my $simulation_h = {}; for ( 1 .. 70000 ) { my $random_start = int( rand(5235641) ); my $random_length = int( rand(40000) ); push @{ $simulation_h->{$random_start} }, $random_length; } my $zone_o = { _chromosome_length => 5235641, _legal_range => [ { FROM => 100000, TO => 200000 } ] }; my $start_time = time; scenario(); print "total loop time: " . ( time - $start_time ) . " seconds\n"; my $temp_gene_to_legal_range; my $gene_to; sub scenario { for ( my $i = 0 ; $i < 50 ; $i++ ) { print "i=$i time=" . ( time - $start_time ) . " seconds\n"; # originally there was a retreive of $simulation_h from disk h +ere # iterate genes foreach my $gene_from ( keys %{$simulation_h} ) { foreach my $gene_length ( @{ $simulation_h->{$gene_from} } + ) { ### inlining gene_to_legal_range $gene_to = ( ( $gene_from + $gene_length - 1 ) % ( $zone_o->{_chromosome_length} ) ) + 1; if ( $gene_to < $gene_from ) { # split # low range first $temp_gene_to_legal_range = [ { FROM => 0, TO => $gene_to }, { FROM => $gene_from, TO => $zone_o->{_chromosome_length} } ]; } else { # single $temp_gene_to_legal_range = [ { FROM => $gene_from, TO => $gene_to } ]; } } } } }

And here is the java code:

import java.util.ArrayList; import java.util.HashMap; import java.util.HashSet; import java.util.List; import java.util.Map; import java.util.Random; import java.util.Set; public class InLine { public static void main(String[] args) { int count = 0; Random rand = new Random(); Map<Integer, List<Integer>> genes = new HashMap<Integer, List< +Integer>>(); for (int i = 0; i < 70000; i++) { Integer randStart = rand.nextInt(5235641); Integer randLength = rand.nextInt(40000); if (!genes.containsKey(randStart)) { List<Integer> list = new ArrayList<Integer>(); genes.put(randStart, list); } genes.get(randStart).add(randLength); } int chromosomeLength = 5235641; SimpleRange simpleRange = new SimpleRange(100000, 200000); Set<SimpleRange> setSimpleRanges = new HashSet<SimpleRange>(); setSimpleRanges.add(simpleRange); long startTimeMillis = System.currentTimeMillis(); for (int i = 0; i < 50; i++) { System.out .println("Iter = " + i + " Time from start: " + ((double) (System.currentTimeMillis() - +startTimeMillis) / 1000)); for (int geneFrom : genes.keySet()) { for (int geneLength : genes.get(geneFrom)) { int geneTo = (geneFrom + geneLength - 1) % chromos +omeLength + 1; if (geneTo < geneFrom) { // split // low range first SimpleRange lowSimpleRange = new SimpleRange(0 +, geneTo); SimpleRange highSimpleRange = new SimpleRange( +geneFrom, chromosomeLength); Set<SimpleRange> setSimpleRanges1 = new HashSe +t<SimpleRange>(); setSimpleRanges1.add(lowSimpleRange); setSimpleRanges1.add(highSimpleRange); count++; } else { // single SimpleRange simpleRange1 = new SimpleRange(gen +eFrom, geneTo); Set<SimpleRange> setSimpleRanges1 = new HashSe +t<SimpleRange>(); setSimpleRanges1.add(simpleRange1); count++; } } } } System.out .println("Time from start: " + ((double) (System.currentTimeMillis() - star +tTimeMillis) / 1000) + " count=" + count); } } class LegalRange { Set<SimpleRange> set; public LegalRange(Set<SimpleRange> set) { this.set = set; } } class SimpleRange { int from; int to; public SimpleRange(int from, int to) { this.from = from; = to; } }

Am I missing anything? Is my perl code highly inefficient in some mysterious way?

I was quite surprised by the difference.

Comment on Benchmarking Simple Perl program vs. Java equivalent
Select or Download Code
Re: Benchmarking Simple Perl program vs. Java equivalent
by bluescreen (Friar) on Jun 19, 2010 at 01:28 UTC

    You can optimize your code a little bit by transforming the C-like for( my $i = 0 ; $i < 50 ; $i++ ) into for(0..50) and keys %{$simulation_h} into an array outside the for loops with all that I've reduced execution time 17% ( aprox ~ 3.5 secs ).

    In general the more you try to optimize your code the uglier and less maintainable it will be. On the other hand usually strong typed languages like C/Java can be optimized more than weak typed ones.

    You choose Perl not because its faster than Java or C you choose it because your productivity gets boosted in Perl and you can come up real quick with working prototypes or you use the flexibility it gives you.

    If execution speed is the only thing you care about then:

    • Buy a faster machine
    • Buy several machines and divide the work among them
    • Transform your program to a multi-process program to take advantage of multiple processors or idle time when accessing a shared resource
    • Start programming in C/Java

    I'd recommend you to read High Performance Perl it is an interesting thread

Re: Benchmarking Simple Perl program vs. Java equivalent
by Krambambuli (Deacon) on Jun 19, 2010 at 11:25 UTC
    I'm too much of a Java-ignorant to be able to pinpoint the 'real' differences here, but it seems to me that the Java-hashlike construct you're using is quite a different beast than Perl's hashes.

    Whereas in Perl you have real arrays of real hashes, and the hashes deep down there _do have_ _stored_ keys _and_ stored values, I somewhat miss the hash keys in the Java code - but I might be wrong.

    Obviously, if the basic underlying data structures used in Perl are indeed much more sophisticated/complicated than what's there for it in Java - despite their 'almost alikeness' - the processing time will reflect this difference between the structures used.

    Update By rewriting the core of your Perl code as
    if ( $gene_to < $gene_from ) { # split # low range first $temp_gene_to_legal_range = [ # { FROM => 0, TO => $ge +ne_to }, # { # FROM => $gene_ +from, # TO => $zone_ +o->{_chromosome_length} # } 0, $gene_to, $gene_from, $zo +ne_o->{_chromosome_length} ]; } else { # single $temp_gene_to_legal_range = # [ { FROM => $gene_from, TO +=> $gene_to } ]; [ $gene_from, $gene_to ] }
    the time displayed goes down on my machine from approx. 14.5 to 11.4. Might be that that's another path to follow.
Re: Benchmarking Simple Perl program vs. Java equivalent
by Marshall (Prior) on Jun 20, 2010 at 05:59 UTC
    I also ran some benchmarks on this code. The overall time as posted took about 24+ seconds. Executing the loop "mechanics" of the foreach within scenario() takes about 4.6 seconds. The time to calculating the modulus statement about the same. The the memory allocation stuff in the "inlining section" takes about 3x as long as either one of these segments, e.g. 1/5 loop mechanics, 1/5 modulo calculations, 3/5 "saving results of "inlining section"

    1. The time to create the hash to begin with is very fast(negligible). But I never see anywhere where the random access nature of a hash is used. Iterating through an array will be much faster than iterating over all the keys of a large hash. Perl allows "ragged 2D" array structures an(AoA) and something like that might be more appropriate if you are interested in speed.

    2. Using and iterating over some kind of Array based structure would be considerably faster than all keys of a hash.

    3. I was surprised to find how much time the "$gene_to" calculation took. I'm not sure why that is or what could easily be done about it.

    4. The "$temp_gene_to_legal_range" calculation is a bit "non-real world" because this allocates a anonymous array containing one or 2 anonymous hashes (which also have to be dynamically allocated and created), But then the $temp_gene_to_legal_range array reference is "thrown away". This leaves the memory still allocated, but in a way that you will never be able to reference it again. So this part of the code is "expensive" due to all the memory structures that are being dynamically created. Krambambuli's code is faster because it eliminates the anon hash allocations.

    5. So with 70,000x50x(( 1 array allocation)+(1 to 2 hash allocations-guess 1.5 avg?)), sub scenario() is going to take awhile and it does! Anyway it appears that saving this calculation is so expensive, that you'd be better of calculating it when you need it and then use it right then. Right now its not even saved so I have no idea of the eventual plan to make use of this.

    6. I don't know enough about your problem to know what to recommend exactly in the way of alternative data structures, but my benchmarking indicates that this 70,000x50x~2.5 new memory structure allocations is what is taking the time - If I have my decimal point right, that is close to 9 million!. A more sophisticated 2-d hash structure or a Array of Hash or Array of Array of Array may serve the purpose better? It may be faster to extend some existing structure than create many millions of new little ones.

    I don't much about Java and can't speak to the relative "apples to apples" or "apples to oranges" comparative nature of your code. I don't know if your Java code calls new memory allocation 9 million times, but your Perl code does.

      Thank you very much for the wise words. I will try to follow your advice.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://845380]
Approved by marto
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (18)
As of 2014-07-10 19:05 GMT
Find Nodes?
    Voting Booth?

    When choosing user names for websites, I prefer to use:

    Results (215 votes), past polls