Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Re^3: Why does Perl get slower when building a larger hash? (Not due to the memory swapping)

by tmharish (Friar)
on Mar 04, 2013 at 08:56 UTC ( #1021609=note: print w/ replies, xml ) Need Help??


in reply to Re^2: Why does Perl get slower when building a larger hash? (Not due to the memory swapping)
in thread Why does Perl get slower when building a larger hash? (Not due to the memory swapping)

Dividing by $x ( number of files ) does not help cause you still have a O( log(N) ) in there ( the inner loop )

As a demonstration, consider the following code that benchmarks hash access times:

use strict ; use warnings ; use Time::HiRes qw( time ) ; my @bench_mark_points = ( 1_000, 10_000, 100_000, 1_000_000, 10_000_00 +0, 20_000_000, 30_000_000, 40_000_000, 50_000_000 ) ; my $bench_mark_point = 0 ; my %large_hash ; my @keys ; for( my $elem = 0 ; $elem < 50_000_001; $elem++ ) { my $key = rand( $elem ) . "hello" . rand( $elem ); $large_hash{ $key } = 1 ; push @keys, $key ; if( $elem == $bench_mark_points[ $bench_mark_point ] ) { _bench_mark( $bench_mark_points[ $bench_mark_point ] ) ; $bench_mark_point++ ; } } sub _bench_mark { my $benchmark_point = shift ; my @benchmark_keys = map( { $keys[ int( rand( $benchmark_point ) +) ] } ( 0 .. 1_000_000 ) ) ; my $total_time = 0 ; foreach my $key ( @benchmark_keys ) { my $start_time = time; my $val = $large_hash{ $key } ; my $end_time = time ; $total_time += ( $end_time - $start_time ) ; } print "Benchmarked Hash access of size $benchmark_point \t -> Acce +ss time for 1_000_000 keys: " . $total_time . "\n"; return ; }

Output

Benchmarked Hash access of size 1000    	 -> Access time for 1_000_000 keys: 0.11689305305481
Benchmarked Hash access of size 10000   	 -> Access time for 1_000_000 keys: 0.121062278747559
Benchmarked Hash access of size 100000   	 -> Access time for 1_000_000 keys: 0.125393152236938
Benchmarked Hash access of size 1000000 	 -> Access time for 1_000_000 keys: 0.116819381713867
Benchmarked Hash access of size 10000000 	 -> Access time for 1_000_000 keys: 0.118601083755493
Benchmarked Hash access of size 20000000 	 -> Access time for 1_000_000 keys: 0.117170572280884

You will notice that the time to access a million keys is nearly always the same - The memory on the other hand shoots up dramatically reaching 7.8 GB at around 25 Million entries ( which is why my bench-marking stops there )

Perl is working really hard in the background to ensure that it builds structures that can ensure that finding an element ( which should be O(N) ) is closing constant time here.

UPDATE: Here is the output again with the CPU throttled so the difference can actually be seen:

Benchmarked Hash access of size 1000    	 -> Access time for 1_000_000 keys: 0.438439846038818
Benchmarked Hash access of size 10000   	 -> Access time for 1_000_000 keys: 0.467595815658569
Benchmarked Hash access of size 100000  	 -> Access time for 1_000_000 keys: 0.615071773529053
Benchmarked Hash access of size 1000000 	 -> Access time for 1_000_000 keys: 0.804181814193726
Benchmarked Hash access of size 10000000 	 -> Access time for 1_000_000 keys: 0.873048782348633
Benchmarked Hash access of size 20000000 	 -> Access time for 1_000_000 keys: 0.910530567169189

So the time does go up - and thats expected. You cannot find an element in less than O( logN ); The speed here is because Perl optimizes this at a low level, but eventually the effect adds up.


Comment on Re^3: Why does Perl get slower when building a larger hash? (Not due to the memory swapping)
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1021609]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (11)
As of 2015-07-02 21:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (45 votes), past polls