Re: Averaging Elements in Array of Array

Replies are listed 'Best First'.
Re^2: Averaging Elements in Array of Array by bruno (Friar) on Dec 26, 2008 at 14:41 UTC
I thought so too, but here's the benchmark for it: #!/usr/bin/perl use strict; use warnings; use Benchmark qw/cmpthese/; use PDL::LiteF; my @data = ( [ 0, 3, 2, 1 ], [ 1, 11, 1, 2 ], [ 5, -2, 0, 1 ], ); # It's not fair to make the conversion every time. my $pdldata = pdl @data; sub using_array { my @data = @_; my @sums; for my $i ( 0 .. $#data ) { $sums[0] += $data[$i][0]; $sums[1] += $data[$i][1]; $sums[2] += $data[$i][2]; $sums[3] += $data[$i][3]; } $sums[$_] /= @data for 0 .. 3; return @sums; } sub using_pdl { my $pdldata = shift; $pdldata /= $pdldata->getdim(1); return $pdldata->transpose->sumover; } cmpthese( 100000, { 'Array-based' => sub { using_array(@data) }, 'PDL-based' => sub { using_pdl($pdldata) }, } ); [download] Result: `Rate PDL-based Array-based PDL-based 36496/s -- -67% Array-based 111111/s 204% --` [download] Apparently, for a dataset of this size (3 by 4) it's not worth it to use PDL. The good thing though, is that it can be applied for a bidimensional piddle of an arbitrary size without modifications of the subroutine. I suppose that PDL scales much better though, I've used it for multidimensional piddles of 1e7 elements with a 50-fold increase in speed over a traditional array-based implementation.	[reply] [d/l] [select]
Re^3: Averaging Elements in Array of Array by hda (Chaplain) on Dec 26, 2008 at 14:53 UTC
Bruno, you are completely right: in this case the use PDL is justified when working with large arrays. I just supposed that neversaint's array was just an example and that the real problem was a bit more complicated.	[reply]
Re^3: Averaging Elements in Array of Array by xhunter (Sexton) on Dec 27, 2008 at 21:12 UTC
Building the shoulders of giants here's a more generalized benchmarker. It shows that while the PDL approach is slower for a 5x5 matrix, it quickly becomes the choice for speed of computation as the matrix size grows. For example, given a 30x30 matrix one can average the columns of it using PDL method 7 times faster than with conventional methods. Imagine the gains with dimensions in the hundreds or thousands. #!/usr/bin/perl use strict; use warnings; use Benchmark qw/cmpthese/; use PDL::LiteF; my @number_of_arrays = qw(5 15 30); my @size_of_arrays = qw(5 15 30); my $iterations = 50000; my $max_integer = 100; benchmark_it( \@number_of_arrays, \@size_of_arrays, $max_integer ); #---------------- sub benchmark_it { my $number_of_arrays = shift; my $size_of_arrays = shift; my $max_random_integer = shift; for my $number ( @{$number_of_arrays} ) { for my $size ( @{size_of_arrays} ) { my $data = build_random_array( $number, $size, $max_random_integer +); my $pdldata = pdl $data; print "Results when number of arrays is $number and size of each array is $s +ize:\n"; cmpthese( $iterations, { 'Array-based' => sub { using_array($data) }, 'PDL-based' => sub { using_pdl($pdldata) }, 'Map-based' => sub { using_map($data) }, } ); print "\n"; } } } sub using_array { my $data = shift; my @sums; my $last_row_index = scalar @{$data} - 1; my $last_column_index = scalar @{ $data->[0] } - 1; for my $i ( 0 .. $last_row_index - 1 ) { for my $j ( 0 .. $last_column_index ) { $sums[$j] += $data->[$i][$j]; } # Hard-coded indices run faster. # $sums[0] += $data[$i][0]; # $sums[1] += $data[$i][1]; # $sums[2] += $data[$i][2]; # $sums[3] += $data[$i][3]; } $sums[$_] /= ( $last_row_index + 1 ) for 0 .. $last_column_index; return @sums; } sub using_map { my $data = shift; my $range_max = scalar @{ $data->[0] } - 1; my @sums; map { for my $j ( 0 .. $range_max ) { $sums[$j] += $_->[$j]; } } @{$data}; return \@sums; } sub using_pdl { my $pdldata = shift; $pdldata /= $pdldata->getdim(1); return $pdldata->transpose->sumover; } sub build_random_array { my $number_of_arrays = shift \|\| 10; my $size_of_arrays = shift \|\| 10; my $max_integer = shift \|\| 100; my $data; foreach my $i ( 1 .. $number_of_arrays ) { my @random_array; push @random_array, int rand( $max_integer + 1 ) for ( 1 .. $size_of_arrays ); push @{$data}, \@random_array; } return $data; } __END__ =head1 Synopsis Compare PDL to more conventional methods of finding the average of the + column vectors in a 2D matrix. =head1 Results My results on December 27, 2008 Results when number of arrays is 5 and size of each array is 5: Rate PDL-based Array-based Map-based PDL-based 42017/s -- -50% -57% Array-based 84746/s 102% -- -14% Map-based 98039/s 133% 16% -- </pre> Results when number of arrays is 30 and size of each array is 30: Rate Array-based Map-based PDL-based Array-based 3987/s -- -17% -89% Map-based 4808/s 21% -- -86% PDL-based 35461/s 789% 638% -- =head1 Notes Note when the matrix is small, 5x5, PDL is slower but as the size of t +he matrix grows, PDL becomes smokin hot from it's speed. It's nice to see the recent development activity with PDL. =cut [download]	[reply] [d/l]
Re^4: Averaging Elements in Array of Array by hda (Chaplain) on Dec 28, 2008 at 11:25 UTC
Very nice contribution xhunter! PDL (http://pdl.perl.org/) rocks!	[reply]


Just another Perl shrine
	PerlMonks