Re: RFC: Fuzzy Clustering with Perl

I'm going to guess that this is a straight transfer of a C program to Perl --

First, you'd using a whole lot of for loops for tracking indexes to arrays:

for (my $i = 0; $i < $number_of_clusters; $i++) { ... }

In Perl, if you're just trying to iterate over a range, you can use the 'foreach' style loop, with the range operator:

for my $i ( 0 .. $number_of_clusters-1 ) { ... }

Even if we were doing this in C, for the type of loops you're dealing with (starting at 0, order of operations doesn't matter), I'd still change the code, to reduce the number of comparisons against non-0 values:

for (i = number_of_clusters; i--; ) { ... }

In some cases, we can use the 'foreach' style loops to eliminate the need for index values, but because you're using them to index multiple arrays within the loop, that's not possible in your case.

...

Another change I might make is in how you deal with undefined values -- if the value must be defined, and can't be 0, (eg, $number_of_clusters), you can use the '||=' operator:

$number_of_clusters ||= 2;

...

The only other thing is in how it's called -- if it were OO, you could inherit from it, and then replace the 'distance' function (or you could have it accept a coderef in for the distance function, if you didn't want to support inheritance), as some people prefer the manhatten distance when they're dealing with clusters:

sub distance {
  my ($vec1,$vec2) = @_;
  my $distance = 0;
  for my $i ( 0 .. (scalar @$vec1)-1 ) {
    $difference += abs( $vec1->[$i] - $vec2->[$i] );
  }
  return $difference;
}
[download]

Comment on Re: RFC: Fuzzy Clustering with Perl Select or Download Code

Replies are listed 'Best First'.
Re^2: RFC: Fuzzy Clustering with Perl by lin0 (Curate) on Nov 03, 2006 at 20:56 UTC
jhourcle Thank you very much for your comments. I will work on a new version of the script, following your suggestions, and I will post it some time next week. I will address some of your comments below. "I'm going to guess that this is a straight transfer of a C program to Perl --" This is a very good guess. In fact, it is true! I have been programming in Perl for three months now and I recognize that I have a long way to go. This is one of the reasons I am asking for comments. So, I can improve my Perl coding skills in the least amount of time possible. "First, you'd using a whole lot of for loops for tracking indexes to arrays:" that is true. I am trying to overcome that habit "In Perl, if you're just trying to iterate over a range, you can use the 'foreach' style loop, with the range operator: `for my $i ( 0 .. $number_of_clusters-1 ) { ... }` [download] "Even if we were doing this in C, for the type of loops you're dealing with (starting at 0, order of operations doesn't matter), I'd still change the code, to reduce the number of comparisons against non-0 values:" `for (i = number_of_clusters; i--; ) { ... }` [download] I like the two options. However, maybe the first one is easier to understand for someone who is new to Perl. For the second one, you must have clear that in Perl the evaluation of $i is done first allowing the loop to continue and then the variable is decreased. This might be hard to see for someone new to the language (I had to try it to see what it did) "Another change I might make is in how you deal with undefined values -- if the value must be defined, and can't be 0, (eg, $number_of_clusters), you can use the '\|\|=' operator:" `$number_of_clusters \|\|= 2;` [download] thank you for the pointer. Trying the \|\|= operator made me realize that the $number_of_cluster cannot be negative either. So maybe I should do `my $number_of_clusters = abs(shift @ARGV);` [download] followed by the line you suggested. Is there another way around that? "The only other thing is in how it's called -- if it were OO, you could inherit from it, and then replace the 'distance' function (or you could have it accept a coderef in for the distance function, if you didn't want to support inheritance), as some people prefer the manhatten distance when they're dealing with clusters:" I have to think about this. I have to study OO in Perl, first. Thanks again. lin0	[reply] [d/l] [select]
Re^3: RFC: Fuzzy Clustering with Perl by Anonymous Monk on Nov 08, 2006 at 02:10 UTC
Just out of curiosity, why are you implementing this in Perl? If the number of features goes beyond, say 5, for any reasonable dataset, this will be too slow to be of much use. I'd think this is the sort of thing you'd implement in C and then provide Perl bindings for...	[reply]
Re^4: RFC: Fuzzy Clustering with Perl by lin0 (Curate) on Nov 08, 2006 at 18:05 UTC
Hello Thank you for your comments. I will try to address them to the best of my knowledge “Just out of curiosity, why are you implementing this in Perl?” I am interested in developing a granular computing implementation using Perl. You can see this post I wrote on the topic. Clustering is an essential part of a granular computing implementation and because I could not find any previous implementation of Fuzzy C-means in Perl, I decided to write one (basically I just ported a code I had written in C to Perl). I also saw the opportunity of writing a Perl implementation of the Fuzzy C-means as a learning opportunity. I have been programing in Perl for three months so I decided this was a good starting project. Moreover, I need to gain a better understanding on how to program in Perl to be able to start my Granular Computing implementation. That is the final goal. "If the number of features goes beyond, say 5, for any reasonable dataset, this will be too slow to be of much use." That could certainly be the case. However, for the projects I am planning to use this for, I do not expect to have many more than 5 features. In any case, to make it more general, I will start thinking about how to speed up the processing. “I'd think this is the sort of thing you'd implement in C and then provide Perl bindings for...” This could be a good solution. In fact, I checked on CPAN and Algorithm::Cluster is implemented that way: as a Perl Interface to the C Clustering Library. That is something that I will certainly consider in the very near future Again, thank you for your comments Cheers! lin0	[reply]
Re^2: RFC: Fuzzy Clustering with Perl by BUU (Prior) on Nov 03, 2006 at 20:52 UTC
Just a nit: `for my $i ( 0 .. $number_of_clusters-1 ) { ... }` Should be: `for my $i ( 0 .. $#number_of_clusters ) { ... }` Update: Hrm, whups, I read that first line as "0..@number_of_clusters", apparently multiple times. It should not be "0 .. $#number_of_clusters" since @number_of_clusters doesn't exist. It might be more clearly written "1 .. $number_of_clusters" though, =]	[reply] [d/l] [select]
Re^3: RFC: Fuzzy Clustering with Perl by lin0 (Curate) on Nov 03, 2006 at 21:34 UTC
Hi BUU, I think that the first option is the correct one because I want the loop to run for $number_of_clusters times. Because the loop starts in 0, it should go up to $number_of_clusters-1. Cheers! lin0	[reply]