Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re^2: Performance problem with Clone Method

by BrowserUk (Pope)
on Jul 27, 2011 at 09:09 UTC ( #916967=note: print w/ replies, xml ) Need Help??


in reply to Re: Performance problem with Clone Method
in thread Performance problem with Clone Method

Problem: How do you access/manipulate the current value of a single number (the value stored at a single indexed position) in a piddle?

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.


Comment on Re^2: Performance problem with Clone Method
Re^3: Performance problem with Clone Method
by dcmertens (Beadle) on Jul 27, 2011 at 12:47 UTC

    BrowserUK, you don't think this is actually a problem, do you? PDL is a very mature Perl extension that handles high dimensional data sets very nicely. In fact (surprise!) there's more than one way to do it. At the moment, I believe the best documentation for beginners is the Matlab or Scilab migration guides. They say that PDL::QuickStart is a better place to start, but I humbly disagree. See http://pdl.perl.org/?docs=Tutorials&title=PDL::Tutorials for the (short) list of tutorials available.

    To answer your specific question, suppose you have a 2x4 piddle, perhaps created with the sequence command. A snippet of your code might look like this:

    my $pdl = sequence(2,4);

    If you wanted to modify the (0,3) element of the array (first column, last row), you would use NiceSlice and the .= notation like so:

    $pdl(0,3) .= -4;

    A full working example would look like this:

    use strict; use warnings; use PDL; use PDL::NiceSlice; my $pdl = sequence(2,4); print "$pdl\n"; $pdl(0,3) .= -4; print "$pdl\n";

    The output looks like this:

    [ [0 1] [2 3] [4 5] [6 7] ] [ [ 0 1] [ 2 3] [ 4 5] [-4 7] ]

    Edit: revised the opening paragraph to be more useful

    Edit2: re-revised the opening paragraph to be even more useful

    Edit3: used code tags instead of pre tags for example code and ouput

      I think that if an algorithm requires access to individual elements of the piddles, rather than being able to apply single operations to whole piddles at a time, there is little to be gained. What you might gain from more efficient duplication, you lose by having to call one or two functions per element during calculation or conditional testing.

      From my very limited understanding of the OPs problem, much of the effort involved in the algorithm involves:

      • interchanging whole rows & whole columns in 2D arrays.

        I don't think that PDL is particularly efficient at performing these operations, especially the latter.

      • performing "bit-wise" boolean operations and 'counting-the-set-bits', on pairs of rows of zeros and ones.

        If the rows of 0s and 1s were encoded as simple bit-vectors, then not only can standard Perl can perform both these operations more efficiently than PDL, the storage requirements are 8x less.

      I'd be very happy to be wrong here, but I just don't think PDL suits these particular types of operations.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Unfortunately I don't know anything about the OP's problem and an internet search has not proved forthcoming with a simple explanation, in matrix terms, of the Ullmann algorithm. If Commandosupremo could spell out his implementation of the algorithm, giving a list of the operations needed, I could say PDL is the right tool or not. As to your specific points, PDL may be able to handle the operations you pose, depending on what you mean.

        PDL supports a whole slew of operations including bitwise boolean operations, as mentioned here: http://pdl.perl.org/?docs=Ops&title=PDL::Ops. For counting the set bits, that's just a sum; you can perform that and many other row or whole-piddle operations as discussed here: http://pdl.perl.org/?docs=Ufunc&title=PDL::Ufunc. The implementations of the Ufuncs use simple accumulators, which loses precision when you're summing over thousands or millions of floating-point values, but will work just fine if you're summing a few hundred integers, as is the case (I believe) of the OP.

        I am note very experienced at exchanging whole rows or columns using PDL, though I know it is something that PDL manages to do with data flow. In other words, when you generate piddle $b from piddle $a with a row exchanged, you can modify piddle $a and it'll show up in piddle $b, and vice versa. Put differently, exchanges lead to different 'views' of the same data. This can be very powerful, but for the OP it may simply lead to inefficiencies in the back-end.

        I will pose posed this question to the PDL mailing list shortly to see if anybody can help. It would be much more useful if Commandosupremo could post the actual algorithm. At any rate, interchanging whole rows in 2D arrays, and whole columns in 2D arrays, is more difficult than I had expected, and there might be more efficient ways to do it using PDL.To exchange rows or columns, you can use the dice_axis function, like so:

        use warnings; use PDL; use PDL::NiceSlice; # Here is my matrix: my $data = sequence(5,5); print "Original:\n$data\n"; # In order to permute columns, I need a *column* vector with the # column numbers that I want. (You'd think a *row* vector would be # more appropriate, but that's not how it works. Alas.) my $indices = sequence(5); # Set up to exchange the second and the third column: #$indices(1:2) .= $indices(2:1); ### should work but doesn't for me $indices(1:2) .= $indices(pdl [2,1]); # Perform the exchange: my $rearranged = $data->dice_axis(0, $indices); print "Column swap:\n$rearranged\n"; # Start over and exchange the fourth and fifth rows: $indices = sequence(5); $indices(3:4) .= $indices(pdl[4,3]); my $further_rearranged = $rearranged->dice_axis(1, $indices); print "Column and row swap:\n$further_rearranged\n";
        The output I get from that looks like this:
        Original: [ [ 0 1 2 3 4] [ 5 6 7 8 9] [10 11 12 13 14] [15 16 17 18 19] [20 21 22 23 24] ] Column swap: [ [ 0 2 1 3 4] [ 5 7 6 8 9] [10 12 11 13 14] [15 17 16 18 19] [20 22 21 23 24] ] Column and row swap: [ [ 0 2 1 3 4] [ 5 7 6 8 9] [10 12 11 13 14] [20 22 21 23 24] [15 17 16 18 19] ]

        Edit: updated to use dice_axis

Re^3: Performance problem with Clone Method
by dwm042 (Priest) on Jul 27, 2011 at 21:40 UTC
    PDL(i,j) manipulation:

    Look at the get and set functions in PDL::Func.

    To note, the difference between column and row operations are the difference between 0 transpose operations and 2 of them. That said, between PDL::Slices and PDL::NiceSlice there are so many ways to skin this cat I'm surprised people worry about the issue of column/row operations in PDL.

    #! /usr/bin/perl use PDL; my $M = sequence(10,10); print $M; my $row = $M->slice(':,3'); my $col = $M->slice('4,:'); my $col2 = $M->slice('5,:'); my $deep_col = $col->copy; $col .= $col2; $col2 .= $deep_col; print $M;


    See also reorder, dice, range etc. Update:original code example broken.

    If you're doing lots of row or column manipulations, perhaps better to keep a 1d piddle of index values and manipulate those.
      there are so many ways to skin this cat I'm surprised people worry about the issue of column/row operations in PDL.

      The problem is not "can it be done with PDL", more "do you gain anything by using PDL to do it"? Ie. Is it more efficient?

      This iterates through all the row and column permutations of a 10x10 matrix in 82 seconds:

      #! perl -slw use strict; use Data::Dump qw[ pp ]; use Time::HiRes qw[ time ]; #$Data::Dump::WIDTH = 1000; use Algorithm::Combinatorics qw[ permutations ]; my @a = map[ 10 *$_ .. 10 *$_ + 9 ], 0 .. 9; ## rows my $start = time; my $perms = permutations( [ 0 .. 9 ] ); while( my $p = $perms->next ) { my @perm = @a[ @$p ]; } printf "All row permutations took %f seconds\n", time - $start; ## cols $start = time; $perms = permutations( [ 0 .. 9 ] ); while( my $p = $perms->next ) { my @perm = map[ @{$_}[ @$p ] ], @a; } printf "All column permutations took %f seconds\n", time - $start; __END__ C:\test>PermsMatrix.pl All row permutations took 17.198000 seconds All column permutations took 65.284842 seconds

      The OP mentioned matrices of "hundreds x hundreds".

      As I understand the brute force algorithm for the Subgraph Isomorphism Problem, it requires performing all the row permutations for all the column permutations of the smaller of the two adjacency matrix graphs for every equal sized subgraph of the larger adjacency matrix. Ullmann trims the tree somewhat, but essentially still requires many of the iterations and all the transformations to be performed.

      I've no doubts that this can be done with PDL; I just wonder if you gain much performance?


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        I read this thread through 3-4 times before I "got it", which is that PDL is really a floating point engine that makes things like singular value decomposition a one liner.

        And that back in 1983, the kind of programming the OP is interested in would have been written in assembler and/or C, and so bitwise efficiency is to be optimized here.

        D-

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://916967]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (8)
As of 2014-07-29 07:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (211 votes), past polls