Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re^2: Why does Perl get slower when building a larger hash? (Not due to the memory swapping)

by chialingh (Initiate)
on Mar 01, 2013 at 16:21 UTC ( #1021292=note: print w/ replies, xml ) Need Help??


in reply to Re: Why does Perl get slower when building a larger hash? (Not due to the memory swapping)
in thread Why does Perl get slower when building a larger hash? (Not due to the memory swapping)

Yes, what I did in the code is measuring the time taken to compare every file with every other file. However, the time log I showed is the running time divided by the number of file pairs, i.e. $time/$x

In that case, the time is the unit time for each comparison and it shouldn't go up with the number of files, right?

Sorry for the misleading code, I'll revise it. :)


Comment on Re^2: Why does Perl get slower when building a larger hash? (Not due to the memory swapping)
Replies are listed 'Best First'.
Re^3: Why does Perl get slower when building a larger hash? (Not due to the memory swapping)
by tmharish (Friar) on Mar 04, 2013 at 08:56 UTC

    Dividing by $x ( number of files ) does not help cause you still have a O( log(N) ) in there ( the inner loop )

    As a demonstration, consider the following code that benchmarks hash access times:

    use strict ; use warnings ; use Time::HiRes qw( time ) ; my @bench_mark_points = ( 1_000, 10_000, 100_000, 1_000_000, 10_000_00 +0, 20_000_000, 30_000_000, 40_000_000, 50_000_000 ) ; my $bench_mark_point = 0 ; my %large_hash ; my @keys ; for( my $elem = 0 ; $elem < 50_000_001; $elem++ ) { my $key = rand( $elem ) . "hello" . rand( $elem ); $large_hash{ $key } = 1 ; push @keys, $key ; if( $elem == $bench_mark_points[ $bench_mark_point ] ) { _bench_mark( $bench_mark_points[ $bench_mark_point ] ) ; $bench_mark_point++ ; } } sub _bench_mark { my $benchmark_point = shift ; my @benchmark_keys = map( { $keys[ int( rand( $benchmark_point ) +) ] } ( 0 .. 1_000_000 ) ) ; my $total_time = 0 ; foreach my $key ( @benchmark_keys ) { my $start_time = time; my $val = $large_hash{ $key } ; my $end_time = time ; $total_time += ( $end_time - $start_time ) ; } print "Benchmarked Hash access of size $benchmark_point \t -> Acce +ss time for 1_000_000 keys: " . $total_time . "\n"; return ; }

    Output

    Benchmarked Hash access of size 1000    	 -> Access time for 1_000_000 keys: 0.11689305305481
    Benchmarked Hash access of size 10000   	 -> Access time for 1_000_000 keys: 0.121062278747559
    Benchmarked Hash access of size 100000   	 -> Access time for 1_000_000 keys: 0.125393152236938
    Benchmarked Hash access of size 1000000 	 -> Access time for 1_000_000 keys: 0.116819381713867
    Benchmarked Hash access of size 10000000 	 -> Access time for 1_000_000 keys: 0.118601083755493
    Benchmarked Hash access of size 20000000 	 -> Access time for 1_000_000 keys: 0.117170572280884
    

    You will notice that the time to access a million keys is nearly always the same - The memory on the other hand shoots up dramatically reaching 7.8 GB at around 25 Million entries ( which is why my bench-marking stops there )

    Perl is working really hard in the background to ensure that it builds structures that can ensure that finding an element ( which should be O(N) ) is closing constant time here.

    UPDATE: Here is the output again with the CPU throttled so the difference can actually be seen:

    Benchmarked Hash access of size 1000    	 -> Access time for 1_000_000 keys: 0.438439846038818
    Benchmarked Hash access of size 10000   	 -> Access time for 1_000_000 keys: 0.467595815658569
    Benchmarked Hash access of size 100000  	 -> Access time for 1_000_000 keys: 0.615071773529053
    Benchmarked Hash access of size 1000000 	 -> Access time for 1_000_000 keys: 0.804181814193726
    Benchmarked Hash access of size 10000000 	 -> Access time for 1_000_000 keys: 0.873048782348633
    Benchmarked Hash access of size 20000000 	 -> Access time for 1_000_000 keys: 0.910530567169189
    

    So the time does go up - and thats expected. You cannot find an element in less than O( logN ); The speed here is because Perl optimizes this at a low level, but eventually the effect adds up.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1021292]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (12)
As of 2015-07-28 12:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (254 votes), past polls