Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: RFC: Fuzzy Clustering with Perl

by jhourcle (Prior)
on Nov 03, 2006 at 15:59 UTC ( #582110=note: print w/ replies, xml ) Need Help??


in reply to RFC: Fuzzy Clustering with Perl

I'm going to guess that this is a straight transfer of a C program to Perl --

First, you'd using a whole lot of for loops for tracking indexes to arrays:

for (my $i = 0; $i < $number_of_clusters; $i++) { ... }

In Perl, if you're just trying to iterate over a range, you can use the 'foreach' style loop, with the range operator:

for my $i ( 0 .. $number_of_clusters-1 ) { ... }

Even if we were doing this in C, for the type of loops you're dealing with (starting at 0, order of operations doesn't matter), I'd still change the code, to reduce the number of comparisons against non-0 values:

for (i = number_of_clusters; i--; ) { ... }

In some cases, we can use the 'foreach' style loops to eliminate the need for index values, but because you're using them to index multiple arrays within the loop, that's not possible in your case.

...

Another change I might make is in how you deal with undefined values -- if the value must be defined, and can't be 0, (eg, $number_of_clusters), you can use the '||=' operator:

$number_of_clusters ||= 2;

...

The only other thing is in how it's called -- if it were OO, you could inherit from it, and then replace the 'distance' function (or you could have it accept a coderef in for the distance function, if you didn't want to support inheritance), as some people prefer the manhatten distance when they're dealing with clusters:

sub distance { my ($vec1,$vec2) = @_; my $distance = 0; for my $i ( 0 .. (scalar @$vec1)-1 ) { $difference += abs( $vec1->[$i] - $vec2->[$i] ); } return $difference; }


Comment on Re: RFC: Fuzzy Clustering with Perl
Select or Download Code
Re^2: RFC: Fuzzy Clustering with Perl
by BUU (Prior) on Nov 03, 2006 at 20:52 UTC
    Just a nit:
    for my $i ( 0 .. $number_of_clusters-1 ) { ... }

    Should be:
    for my $i ( 0 .. $#number_of_clusters ) { ... }

    Update: Hrm, whups, I read that first line as "0..@number_of_clusters", apparently multiple times. It should not be "0 .. $#number_of_clusters" since @number_of_clusters doesn't exist. It might be more clearly written "1 .. $number_of_clusters" though, =]

      Hi BUU,

      I think that the first option is the correct one because I want the loop to run for $number_of_clusters times. Because the loop starts in 0, it should go up to $number_of_clusters-1.

      Cheers!

      lin0
Re^2: RFC: Fuzzy Clustering with Perl
by lin0 (Curate) on Nov 03, 2006 at 20:56 UTC

    jhourcle

    Thank you very much for your comments. I will work on a new version of the script, following your suggestions, and I will post it some time next week.

    I will address some of your comments below.

    "I'm going to guess that this is a straight transfer of a C program to Perl --"

    This is a very good guess. In fact, it is true! I have been programming in Perl for three months now and I recognize that I have a long way to go. This is one of the reasons I am asking for comments. So, I can improve my Perl coding skills in the least amount of time possible.

    "First, you'd using a whole lot of for loops for tracking indexes to arrays:"

    that is true. I am trying to overcome that habit

    "In Perl, if you're just trying to iterate over a range, you can use the 'foreach' style loop, with the range operator:
    for my $i ( 0 .. $number_of_clusters-1 ) { ... }
    "Even if we were doing this in C, for the type of loops you're dealing with (starting at 0, order of operations doesn't matter), I'd still change the code, to reduce the number of comparisons against non-0 values:"
    for (i = number_of_clusters; i--; ) { ... }

    I like the two options. However, maybe the first one is easier to understand for someone who is new to Perl. For the second one, you must have clear that in Perl the evaluation of $i is done first allowing the loop to continue and then the variable is decreased. This might be hard to see for someone new to the language (I had to try it to see what it did)

    "Another change I might make is in how you deal with undefined values -- if the value must be defined, and can't be 0, (eg, $number_of_clusters), you can use the '||=' operator:"
    $number_of_clusters ||= 2;

    thank you for the pointer. Trying the ||= operator made me realize that the $number_of_cluster cannot be negative either. So maybe I should do

    my $number_of_clusters = abs(shift @ARGV);

    followed by the line you suggested. Is there another way around that?

    "The only other thing is in how it's called -- if it were OO, you could inherit from it, and then replace the 'distance' function (or you could have it accept a coderef in for the distance function, if you didn't want to support inheritance), as some people prefer the manhatten distance when they're dealing with clusters:"

    I have to think about this. I have to study OO in Perl, first.

    Thanks again.

    lin0
      Just out of curiosity, why are you implementing this in Perl? If the number of features goes beyond, say 5, for any reasonable dataset, this will be too slow to be of much use. I'd think this is the sort of thing you'd implement in C and then provide Perl bindings for...

        Hello

        Thank you for your comments. I will try to address them to the best of my knowledge

        “Just out of curiosity, why are you implementing this in Perl?”

        I am interested in developing a granular computing implementation using Perl. You can see this post I wrote on the topic. Clustering is an essential part of a granular computing implementation and because I could not find any previous implementation of Fuzzy C-means in Perl, I decided to write one (basically I just ported a code I had written in C to Perl). I also saw the opportunity of writing a Perl implementation of the Fuzzy C-means as a learning opportunity. I have been programing in Perl for three months so I decided this was a good starting project. Moreover, I need to gain a better understanding on how to program in Perl to be able to start my Granular Computing implementation. That is the final goal.

        "If the number of features goes beyond, say 5, for any reasonable dataset, this will be too slow to be of much use."

        That could certainly be the case. However, for the projects I am planning to use this for, I do not expect to have many more than 5 features. In any case, to make it more general, I will start thinking about how to speed up the processing.

        “I'd think this is the sort of thing you'd implement in C and then provide Perl bindings for...”

        This could be a good solution. In fact, I checked on CPAN and Algorithm::Cluster is implemented that way: as a Perl Interface to the C Clustering Library. That is something that I will certainly consider in the very near future

        Again, thank you for your comments

        Cheers!

        lin0

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://582110]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (3)
As of 2014-09-20 06:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (155 votes), past polls