Commandosupremo has asked for the
wisdom of the Perl Monks concerning the following question:
Hi, I have a slight problem that I'm hoping the community here might be able to help me with.
The project I am currently working on requires that I work with a number of matrices. To facilitate this, I have represented each matrix as an object with corresponding methods such as clone, multiply etc. The problem I am having is that in the algorithm I am working with I must create deep copies of the matrices an inordinate number of times. I have been doing this using my 'clone' method and using smallprof I have determined that the clone method is the bottleneck in my code, taking up around 80% of the runtime of the code.
I was hoping someone could look at the method I have written and offer any possible improvements. I am currently using the accessors module which I know can slow things down a little. Also, $self>Matrix is an array ref for a 2D matrix. Below is the code:
sub Clone
{
my $self = shift @_;
my $matrix = $self>Matrix;
return Matrix>new( "Matrix" => [ map ( { my $row = $_; [ map
+ ( { ( $_ ) ; } (@$row) ) ] } (@$matrix) ) ] );
}
Thank you in advance for any help anyone has to offer. While I realize it might be an unavoidable problem, I figured it was worth a shot to ask. According to smallprof, this method gets called 35Million times when running my test data, so an improvement / suggestions would be greatly appreciated. Also, I have tried a few other ways to do this but this has been the fastest so far.
Re: Performance problem with Clone Method by davido (Archbishop) on Jul 26, 2011 at 23:07 UTC 
You're unnecessarily using a second map level. Since your matrix is 2d (not additional layers) there's no need to do any more than to copy each 2nd layer row, which can be done with [ @$_ ] within the outter map
In testing the code below I verified using Data::Dumper that both copies of @matrix were identical in structure. Then I created a copy and changed one of its elements to "Problem!". Then I reprinted the original @matrix to ensure that "Problem!" didn't propagate back to the original matrix (which would have indicated that I didn't get a copy, but rather an alias).
After I was sure that I had duplicated your original functionality I went ahead with benchmarks. As you'll see the new "onemap" method is significantly faster.
use strict;
use warnings;
use Benchmark qw/cmpthese/;
my @matrix = (
[ 1, 2, 3, 4, 5, ],
[ 6, 7, 8, 9, 10, ],
[ 11, 12, 13, 14, 15, ],
[ 16, 17, 18, 19, 20, ],
[ 21, 22, 23, 24, 25, ],
);
cmpthese(
50000,
{
onemap => sub{ my $result = onemap( \@matrix ); },
twomaps => sub{ my $result = twomaps( \@matrix ); },
}
);
sub onemap {
my $matrix = shift;
return {
Matrix => [
map {
[
@$_
]
} @$matrix
]
};
}
sub twomaps {
my $matrix = shift;
return {
Matrix => [
map {
my $row = $_;
[
map { ( $_ ); } @$row
]
} @$matrix
]
};
}
__END__
Rate twomaps onemap
twomaps 62814/s  39%
onemap 103306/s 64% 
You would need to plug the algorithm back into your method call function.
Update with additional benchmarking:
I added the Clone module to my previous example and benchmarked all over again. I also generated a matrix of 100x100 random integers to more closely approximate the size of your datastructure. As I suspected with a larger datastructure time spent in subroutine call overhead melted into the background, better showcasing the performance differences between the algorithms themselves. Here's the code and the results:
 [reply] [d/l] [select] 

Dave:
I made the change from the nested maps to a single map and it made a considerable improvement, thank you very much. I've never used the benchmark package, but I think I will start using it as well, its output looks more useful to me than SmallProf's.
 [reply] 

Glad it helped. By the way, there's a module in the core distribution, Clone, which provides increased flexibility (it can handle datastructures of arbitrary shape) with its clone() function. However, it is actually considerably slower than even your first method. It's an example of where the increased abstraction that makes it a more useful tool all around also carries with it a performance impact.
diotalevi has (or had perhaps) a module on CPAN, Clone::Fast, but when I went to install it with cpan Clone::Fast, the cpan utility couldn't find it. I would like to have tried benchmarking it.
Another aside: Storable (also in the core distribution) offers dclone(), but according to the Clone documentation it's even slower, while being even more flexible.
 [reply] [d/l] [select] 

Re: Performance problem with Clone Method by BrowserUk (Pope) on Jul 26, 2011 at 23:12 UTC 
This does the same thing as your Clone method but takes 1/3rd the time (for 10x10 matrices), just by not doing the completely redundant second map:
sub Clone2 {
my $self = shift @_;
my $matrix = $self>Matrix;
return Matrix>new(
"Matrix" => [ map{ [ @$_ ] } @$matrix ]
);
}
If your matrices are very small (3x5 or 4x4 or less), then it might be worth breaking some OO taboos to gain a little more. This also does the same thing, but avoids a couple of method calls per invocation: sub Clone3{
my $self = shift;
bless { "Matrix" => [
map{ [ @$_ ] } @{ $self>{Matrix} } ]
}, 'Matrix';
}
But in the end, if a call that takes 3 milliseconds (your original for a 100x100 matrix) takes 80% of your time, then you should really start questioning why it is necessary to Clone something 35 million times? It seems to me that using a better algorithm is likely to produce far greater savings.
Examine what is said, not who speaks  Silence betokens consent  Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
 [reply] [d/l] [select] 

I have implemented what you called Clone2 in place of my original clone and it does make a substantial difference.
The matrices I am working with are very large, on the order of hundreds by hundreds
As for changing algorithms, I wish I could but it is not an option, no other algorithm provides as simple of an implementation nor the ability to modify it as this one does. Just FYI, the algorithm I am working with is the Ullmann algorithm for finding maximum subgraph isomorphism.
Lastly, I should point out that I believe I might have been wrong about the number of times the clone method is being called. According to smallprof the original clone method is called 35Million times but I don't believe this is accurate. After replacing clone with clone 2, which should be called the same number of times smallprof says it is being called 1.06 Million times. I know this is ancillary but curious nonetheless. I don't know if this is an error/limitation with smallprof but its worth pointing out nonetheless.
 [reply] 

 [reply] 
Re: Performance problem with Clone Method by Jenda (Abbot) on Jul 27, 2011 at 08:43 UTC 
I may be missing something, but isn't this exactly what PDL was created for?
Jenda
Enoch was right!
Enjoy the last years of Rome.
 [reply] 

 [reply] 

BrowserUK, you don't think this is actually a problem, do you? PDL is a very mature Perl extension that handles high dimensional data sets very nicely. In fact (surprise!) there's more than one way to do it. At the moment, I believe the best documentation for beginners is the Matlab or Scilab migration guides. They say that PDL::QuickStart is a better place to start, but I humbly disagree. See http://pdl.perl.org/?docs=Tutorials&title=PDL::Tutorials for the (short) list of tutorials available.
To answer your specific question, suppose you have a 2x4 piddle, perhaps created with the sequence command. A snippet of your code might look like this:
my $pdl = sequence(2,4);
If you wanted to modify the (0,3) element of the array (first column, last row), you would use NiceSlice and the .= notation like so:
$pdl(0,3) .= 4;
A full working example would look like this:
use strict;
use warnings;
use PDL;
use PDL::NiceSlice;
my $pdl = sequence(2,4);
print "$pdl\n";
$pdl(0,3) .= 4;
print "$pdl\n";
The output looks like this:
[
[0 1]
[2 3]
[4 5]
[6 7]
]
[
[ 0 1]
[ 2 3]
[ 4 5]
[4 7]
]
Edit: revised the opening paragraph to be more useful
Edit2: rerevised the opening paragraph to be even more useful
Edit3: used code tags instead of pre tags for example code and ouput  [reply] [d/l] [select] 


 
PDL(i,j) manipulation:
Look at the get and set functions in PDL::Func.
To note, the difference between column and row operations are the difference between 0 transpose operations and 2 of them. That said, between PDL::Slices and PDL::NiceSlice there are so many ways to skin this cat I'm surprised people worry about the issue of column/row operations in PDL.
#! /usr/bin/perl
use PDL;
my $M = sequence(10,10);
print $M;
my $row = $M>slice(':,3');
my $col = $M>slice('4,:');
my $col2 = $M>slice('5,:');
my $deep_col = $col>copy;
$col .= $col2;
$col2 .= $deep_col;
print $M;
See also reorder, dice, range etc.
Update:original code example broken.
If you're doing lots of row or column manipulations, perhaps better to keep a 1d piddle of index values and manipulate those.
 [reply] [d/l] 


Re: Performance problem with Clone Method by thirdm (Sexton) on Jul 26, 2011 at 22:40 UTC 
Try using a flat array as the data structure and indexing it as Matrix(m, n) => Array[m*ncols + n]
or Matrix(m, n) => Array[n*nrows + m] depending on whether you think row major or column major will be better for you.
All these clones, are they all for good reason or would it be worth considering some kind of copy on write scheme?  [reply] [d/l] [select] 

Thanks for the advice, I will be sure to try that tomorrow when I get a chance to make the modifications.
From my understanding of copy on write, I would only truly copy the data when it is being modified not before. Unfortunately, the algorithm I am working with is constantly modifying the matrices so I don't think copy on write would work. Basically, the algorithm enumerates the matrices in a depth first scheme and once it exhausts one branch of the search tree, loads a stored copy of the matrix to try a different branch. I am generating these stored copies using clone.
Thankyou for the advice
 [reply] 

Are these matrices each the same size? Could you recycle the underlying arrays from abandoned branches?
Or if you have some idea of your storage requirements per branch, maybe you could prealloc in some chunk size large enough to hold several (or more) of your matrices and keep track of offsets to free ones. So your indexing becomes M(i,j) => $all_arrays[$offset + m*ncols + n] where you find $offset upon doing a clone by consulting some kind of free list or bitmap. Your underlying array(s) size would be n * nrows * ncols where n is your chunk size, $offset would take values in [0, nrows*ncols, 2*nrows*ncols, ... n  nrows*ncols] I guess this is kind of writing your own memory manager, which may be too much trouble, but perhaps it would payoff. I'm assuming here that the worst part of your algorithm's performance (assuming it's already the best algorithm as an algorithm that you know) comes from memory allocation and that asking perl to allocate one large array once will be faster than asking it many times to allocate small arrays, but my intuition may be skewed from doing nonPerl work.
 [reply] [d/l] [select] 

Re: Performance problem with Clone Method by dcmertens (Beadle) on Jul 27, 2011 at 13:18 UTC 
Reiterating what Jenda already asked, why aren't you using PDL?  [reply] 

